Chapter 5 More on thematic maps
5.1 Introduction
In this session we are going to discuss some additional features around thematic maps we did not cover in week 3. We are going to discuss how to address some of the problems we confront when we are trying to use use choropleth maps, as well as some alternatives to point based maps. We will also introduce the modifieable area unit problem.
Before we do any of this, we need to load the libraries we will use today:
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
library(tmap)
library(sp)
library(spdep)
## Loading required package: Matrix
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source'))`
library(DCluster)
## Loading required package: boot
## Loading required package: MASS
library(cartogram)
library(ggmap)
## Loading required package: ggplot2
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
There are also some new libraries we will use. If you don’t already have these you will need to install them.
5.1.1 Pro-tip: do I need to install this package?
You might have noticed that in your list of available packages you might see more than you remember downloading. The idea of dependencies has come up throughout the semester. Packages have dependencies when their code is dependent on (uses code from) another package. For example, if I write some code that I think will be useful, so I release this in the form of the package “rekaR”, but I use ggplot2 in the code, then ggplot2 will be a dependency of rekaR. As a default, R will install all the dependencies for a package when you install your package. So this way you might end up with some packages there that you didn’t realise you had.
Why am I telling you this? Well you should always check if you have a package, before installing it. And I wanted to share with you some neat code from a Stackoverflow discussion (if you are not yet familiar with Stackoverflow you have not been Google-ing your error messages enough) here to do this. I’ll comment it a bit, so you can follow along what it does but you don’t have to if you don’t want to. This is just an optional extra.
So as a first step, you have to assign a list of all the packages you have to check to an object. Let’s say I tell you that today we will be using the following packaes: “sp”, “rgdal”, “classInt”, “RColorBrewer”, “ggplot2”, “hexbin”, “ggmap”, “XML”, and “dplyr”. Then you can add these to an object called libs, using the c() function:
libs <- c("sf", "tmap", "sp", "spdep", "DCluster", "cartogram")
Now you can run the below bit of code, and you will see in the console an output of what is and isn’t installed, as well as install the packages that are not!
for (x in libs){ #cycle through each item in libs object
if(x %in% rownames(installed.packages()) == FALSE) { #if the package is not installed
print(paste0("installing ", x, "...")) #print a message to tell me
install.packages(x) #and then install the packages
}
else{ #otherwise (if it is installed)
print (paste0(x, " is already installed ")) #print a message to tell me
}
library(x, character.only = TRUE) #and then load the packages
}
## [1] "sf is already installed "
## [1] "tmap is already installed "
## [1] "sp is already installed "
## [1] "spdep is already installed "
## [1] "DCluster is already installed "
## [1] "cartogram is already installed "
As you can see if you read through the comments there, this bit of code checks each package in the list you pass to tbe libs object when you create it, and if it is not installed it installs for you, and if it is, it just loads it for you. It can be a handy bit of code to keep around.
5.1.2 Data
Then we will bring back data about homicide across US counties. Some of you that have taken previous classes with us may be familiar with this dataset. The dataset was used as the basis for this study. We will be using it also later on in the semester. It contains data on homicide counts and rates for various decades across the US as well as information on structural factors often thought to be associated with violence.
##R in Windows have some problems with https addresses, that's why we need to do this first:
urlfile<-'https://s3.amazonaws.com/geoda/data/ncovr.zip'
download.file(urlfile, 'ncovr.zip')
#Let's unzip and create a new directory (ncovr) in our working directory to place the files
unzip('ncovr.zip', exdir = 'ncovr')
Remember that to treat the data as spatial we need to load the shapefile. With that we can create a spatial object:
shp_name <- "ncovr/ncovr/NAT.shp"
ncovr_sf <- st_read(shp_name)
## Reading layer `NAT' from data source `/Users/reka/Dropbox (The University of Manchester)/crimemapping_textbook_bookdown/ncovr/ncovr/NAT.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3085 features and 69 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -124.7314 ymin: 24.95597 xmax: -66.96985 ymax: 49.37173
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
5.2 Mapping rates, learning from disease mapping
In previous sessions we discussed how to map rates. It seems a fairly straightoforward issue, you calculate a rate by dividing your numerator (eg: number of crimes) by your demoninator (eg: daytime population). You get your variable with the relevant rate and you map it using a choropleth map. However, things are not always that simple. Rates are funny animals. Let’s look at the ncvor data.
summary(ncovr_sf$HR60)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.783 4.504 6.885 92.937
We can see that the county with the highest homicide rate in the 1960s had a rate of 92.937 homicides per 100,000 individuals. That is very high. Just to put it into context in the UK is about 0.92. Where is that place? I can tell you is a place call Borden. Check it out:
borden <- subset(ncovr_sf, NAME == "Borden")
borden$HR60
## [1] 92.9368
Borden county in Texas. You may be thinking… “Texas Chainsaw Massacre” perhaps? No, not really. Ed Gein, who inspired the film, was based and operated in Wisconsin. Borden claim to fame is that it was named after Gail Borden, the inventor of condensed milk. So, what’s going on here? Why do we have a homicide rate in Borden that makes it look like a war zone or the setting for Midsomer Murders? Is is that it is only one of the six counties where alcohol is banned in Texas (and people are consequently going nuts?).
Check this out too:
borden$HC60
## [1] 1
What? A total homicide count of 1. How can a county with just one homicide have a rate that makes it look like the most dangerous place in the US?
borden$PO60
## [1] 1076
Well, there were about 1076 people living there.
summary(ncovr_sf$PO60)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 208 9417 18408 57845 39165 7781984
If you contrast that with the average county in the US, that’s tiny. One homicide in such a small place can end up producing a big rate. Remember that the rate is simply dividing the number of relevant events by the exposure variable (in this case population) and multiplying by a constant (in this case 100,000 since we expressed crime rates in those terms). Most times Borden is a very peaceful place:
borden$HR70
## [1] 0
borden$HR80
## [1] 0
borden$HR90
## [1] 0
It has a homicide rate of 0 in most decades. But it only takes one homicide and, bang, it goes top of the league. So a standard map of rates is bound to be noisy. There is the instability that is introduced by virtue of having areas that may be sparsely populated and in which one single event, like in this case, will produce a very noticeable change in the rate. In fact, if you look at the counties with the highest homicide rate in the ncovr dataset you will notice all of them are places like Borden, areas that are sparsely populated, not because they are that dangerous, but because of the instability of rates.
This is a problem that was first noted by epidemiologists doing disease mapping. But a number of other disciplines have now noted this and used some of the approaches developed by public health researchers that confronted this problem when producing maps of disease (PRO TIP: techniques and approaches used by spatial epidemiologists are very similar to those used by criminologists -in case you ever think of changing careers or need inspiration for how to solve a crime analysis problem).
One way of dealing with this is by smoothing the rates. This basically as the word implies aims for a smoother representation that avoids hard spikes associated with random noise. There are different ways of doing that. Some ways use a non-spatial approach to smoothing, using something called a empirical bayesian smoother. How does this work? This approach takes the raw rates and tries to “shrunk” them towards the overall average. What does this mean? Essentially, we compute a weighted average between the raw rate for each area and the global average across all areas, with weights proportional to the underlying population at risk. What this procedure does is to have the rates of smaller areas (those with a small population at risk) to have their rates adjusted considerably (brought closer to the global average), whereas the rates for the larger areas will barely change.
Here we are going to introduce the approach implemented in DCluster, a package developed for epidemiological research and detection of clusters of disease.
res <- empbaysmooth(ncovr_sf$HC60, ncovr_sf$PO60)
In the new object we generate, which is a list, you have an element which contains the computed rates. We can add those to our dataset:
ncovr_sf$HR60EBS <- res$smthrr * 100000
Instead of shrinking to the global rate, we can shrink to a rate based on the neighbours of each county. If instead of shrinking to a global rate, we shrink to a local rate, we may be able to take unobserved heterogeneity into account; for this we need the list of neighbours (we will discuss this code in a later session, so for now just trust us we are computing the rate of the areas that surround each country):
ncovr_sp <- as(ncovr_sf, "Spatial")
w_nb <- poly2nb(ncovr_sp, row.names=ncovr_sp$FIPSNO)
eb2 <- EBlocal(ncovr_sf$HC60, ncovr_sf$PO60, w_nb)
ncovr_sf$HR60EBSL <- eb2$est * 100000
We can now plot the maps and compare them:
current_style <- tmap_style("col_blind")
## tmap style set to "col_blind"
## other available styles are: "white", "gray", "natural", "cobalt", "albatross", "beaver", "bw", "classic", "watercolor"
map1<- tm_shape(ncovr_sf) +
tm_fill("HR60", style="quantile", title = "Raw rate", palette = "Reds") +
tm_layout(legend.position = c("left", "bottom"),
legend.title.size = 0.8,
legend.text.size = 0.5)
map2<- tm_shape(ncovr_sf) +
tm_fill("HR60EBS", style="quantile", title = "EB Smooth", palette = "Reds") +
tm_layout(legend.position = c("left", "bottom"),
legend.title.size = 0.8,
legend.text.size = 0.5)
map3<- tm_shape(ncovr_sf) +
tm_fill("HR60EBSL", style="quantile", title = "Local Smooth", palette = "Reds") +
tm_layout(legend.position = c("left", "bottom"),
legend.title.size = 0.8,
legend.text.size = 0.5)
tmap_arrange(map1, map2, map3)

Notice that the quantiles are not the same, so that will make your comparison difficult.
5.3 Binning points
In GIS it is often difficult to present point-based data because in many instances there are several different points and data symbologies that need to be shown. As the number of different data points grows they can become complicated to interpret and manage which can result in convoluted and sometimes inaccurate maps. This becomes an even larger problem in web maps that are able to be depicted at different scales because smaller scale maps need to show more area and more data. This makes the maps convoluted if multiple data points are included.
In many maps there are so many data points included that little can be interpreted from them. In order to reduce congestion on maps many GIS users and cartographers have turned to a process known as binning.
Binning is defined as the process of grouping pairs of locations based on their distance from one another. These points can then be grouped as categories to make less complex and more meaningful maps.
Researchers and practitioners often require a way to systematically divide a region into equal-sized portions. As well as making maps with many points easier to read, binning data into regions can help identify spatial influence of neighbourhoods, and can be an essential step in developing systematic sampling designs.
This approach to binning generates an array of repeating shapes over a user-specified area. These shapes can be hexagons, squares, rectangles, triangles, circles or points, and they can be generated with any directional orientation.

5.3.1 The Binning Process
Binning is a data modification technique that changes the way data is shown at small scales. It is done in the pre-processing stage of data analysis to convert the original data values into a range of small intervals, known as a bin. These bins are then replaced by a value that is representative of the interval to reduce the number of data points.
Spatial binning (also called spatial discretization) discretizes the location values into a small number of groups associated with geographical areas or shapes. The assignment of a location to a group can be done by any of the following methods: - Using the coordinates of the point to identify which “bin” it belongs to. - Using a common variable in the attribute table of the bin and the point layers.
5.3.2 Different Binning Techniques
Binning itself is a general term used to describe the grouping of a dataset’s values into smaller groups (Johnson, 2011). The bins can be based on a variety of factors and attributes such as spatial and temporal and can thus be used for many different projects.
5.3.2.1 Choropleth maps
You might be thinkging, “grouping points into a larger spatial unit, haven’t we already done this when making choropleth maps?”. In a way you are right. Choropleth maps are another type of map to that uses binning. Proportional symbol and choropleth maps group similar data points together to show a range of data instead of many individual points. We’ve covered this extensively, and is generally the best approch to consider spatial grouping of your point variables, because the polygons (shapes) to which you are aggregating your points are meaningful. You can group into LSOAs because you want to show variation in neighbourhoods. Or you can group into police force areas because you want to look at differences between those units of analysis. But sometimes there is just not a geography present to meet your needs.
Let’s say you are conducting some days of action in Manchester city centre, focusing on antisocial behaviour. You are going to put up some information booths and staff them with officers to engage with the local population about antiscoail behaviour. For these to be most effective, as an analyst you decide that they should go into the areas with the highest count of antisocial beaviour. You want to be very specific about where you put these as well, and so LSOA level would be too broad, you want to zoom in more. One approach can be to split central Manchester into some smaller polygons, and just calculate the number of antisocial behaviour incidents recorded in each. That way you can then decide to put your information booths somewhere inside the top 5 highest count bins.
5.3.2.2 Rectangular binning
The aggregation of incident point data to regularly shaped grids is used for many reasons such as normalizing geography for mapping or to mitigate the issues of using irregularly shaped polygons created arbitrarily (such as county boundaries or block groups that have been created from a political process). Regularly shaped grids can only be comprised of equilateral triangles, squares, or hexagons, as these three polygon shapes are the only three that can tessellate (repeating the same shape over and over again, edge to edge, to cover an area without gaps or overlaps) to create an evenly spaced grid.
Rectangular binning is the simplest binning method and as such it heavily used. However, there are some reasons why rectangular bins are less preferable over hexagonal bins. Before we cover this, let’s have a look at hexagonal bins.
5.3.2.3 Hexagonal binning
In many applications binning is done using a technique called hexagonal binning. This technique uses hexagon shapes to create a grid of points and develops a spatial histogram that shows different data points as a range or group of pairs with common distances and directions. In hexagonal binning the number of points falling within a particular rectangular or hexagon in a gridded surface is what makes the different colors to easily visualize data (Smith, 2012). Hexagonnal binning was first developed in 1987 and today “hexbinning” is conducted by laying a hexagonal grid on top of 2-dimensional data (Johnson, 2011). Once this is done users can conduct data point counts to determine the number of points for each hexagon (Johnson, 2011). The bins are then symbolized differently to show meaningful patterns in the data.
So how can we use hexbinning to solve our antisocial behaviour days of action task? Well let’s say we split Manchester city centre into hexagons, and count the number of antisocial behaviour instances in these. We can then identify the top hexagons, and locate our booths somewhere within these.
First make sure you have the appropriate packages loaded:
library(ggplot2)
library(ggmap)
library(hexbin)
Also let’s get some data. You could go and get this data yourself from police.uk, we’ve been through all the steps for downloading data from there a few times now. But for now, I have a tidied set of data ready for you. This data is one year’s worth of antisocial behaviour from the police.uk data, from May 2016 to May 2017, for the borough of Manchester.
We can take our GMP crimes data, and select only the cases from ASB using the crime.type variable. If you want, however, I have already done this, so you can also download from my dropbox using the link here:
manchester_asb <- read.csv("https://www.dropbox.com/s/4tk0aps3jfd9nh4/manchester_asb.csv?dl=1")
This is currently just a text dataframe, so we need to let R know that actually this is a spatial object, who’s geometry can be find in its longitude and latitude coordinates. As we have long/lat we can assure it’s in WGS 84 projection.
ma_spatial <- st_as_sf(manchester_asb, coords = c("Longitude", "Latitude"),
crs = 4326, agr = "constant")
Now one thing that this does is it consumes our Long and Lat columnsinto a geometry attribute. This is generally OK, but for the binning we will do, we would like to have them as separate coordinates. To do this, we can use a bespoke function, sfc_as_cols() created by Josh M. London in response to this issue opened on github for the sf package. To create this function here, run the below code:
sfc_as_cols <- function(x, names = c("x","y")) {
stopifnot(inherits(x,"sf") && inherits(sf::st_geometry(x),"sfc_POINT"))
ret <- sf::st_coordinates(x)
ret <- tibble::as_tibble(ret)
stopifnot(length(names) == ncol(ret))
x <- x[ , !names(x) %in% names]
ret <- setNames(ret,names)
dplyr::bind_cols(x,ret)
}
As we are not covering making your own function much here, don’t worry too much about the above code, but if you’re curious, raise your hand in the labs and we can come around and talk through it.
Now finally, let’s use this function we just created to extract the coords to some columns. So in this function, we have to pass as parameters the name of the dataframe, ma_spatial in our case, and also what we want the columnd to be called in a concatenated list (created with the c() function). Let’s call these “lng” and “lat”:
ma_spatial <- sfc_as_cols(ma_spatial, c("lng", "lat"))
As a first step, we can plot asb in the borough of Manchester using simple ggplot! Remember the data visualisation session from weeks ago? We discussed how ggplot is such a great tool for building visualisations, because you can apply whatever geometry best suits your data. So for us to just have a look at the hexbinned version of our point data of antisocial behaviour, we can use the stat_binhex() function. We can also recreate the thematic map element, as we can use the frequency of points in each hex to shade each hexbin from white (least number of incidents) to red (most nuber of incidents).
So let’s have a go:
ggplot(ma_spatial, aes(lng, lat)) + #define data and variables for x and y axes
stat_binhex() + #add binhex layer (hexbin)
scale_fill_gradientn(colours = c("white","red"), name = "Frequency") #add shading based on number of ASB incidents

Neat, but doesn’t quite tell us where that really dark hexbon actually is. So it would be much better if we could do this with a basemap as the backrgound, rather than our grey ggplot theme.
Now, we can apply the same code as we used above, for the ggplot, to this ggmap, to add our hexbins on top of this basemap:
library(ggspatial) #load ggspatial package for background map tiles
ggplot(ma_spatial, aes(x = lng, y = lat)) +
annotation_map_tile() +
stat_binhex(alpha=0.7) + #add binhex layer (hexbin)
scale_fill_gradientn(colours = c("white","red"), name = "Frequency") #add shading based on number of ASB incidents
## Zoom: 10

Now this should give you some more context! Woo!
So I mentioned we’d go over some reasons why you should consider aggregating into a hexagon grid rather than other shape:
- Hexagons reduce sampling bias due to edge effects of the grid shape. The edge effects of bounded space refers to the problem of truncated data that can skew the results of subsequent analyses (we’ll get to this in the next section). This is related to the low perimeter-to-area ratio of the shape of the hexagon. A circle has the lowest ratio but cannot tessellate to form a continuous grid. Hexagons are the most circular-shaped polygon that can tessellate to form an evenly spaced grid.
- This circularity of a hexagon grid allows it to represent curves in the patterns of your data more naturally than square grids.
- When comparing polygons with equal areas, the more similar to a circle the polygon is, the closer to the centroid the points near the border are (especially points near the vertices). This means that any point inside a hexagon is closer to the centroid of the hexagon than any given point in an equal-area square or triangle would be (this is due to the more acute angles of the square and triangle versus the hexagon).
- Hexagons are preferable when your analysis includes aspects of connectivity or movement paths. Due to the linear nature of rectangles, fishnet grids can draw our eyes to the straight, unbroken, parallel lines which may inhibit the underlying patterns in the data. Hexagons tend to break up the lines and allow any curvature of the patterns in the data to be seen more clearly and easily. This breakup of artificial linear patterns also diminishes any orientation bias that can be perceived in fishnet grids.
- If you are working over a large area, a hexagon grid will suffer less distortion due to the curvature of the earth than the shape of a fishnet grid.
- Finding neighbors is more straightforward with a hexagon grid. Since the edge or length of contact is the same on each side, the centroid of each neighbor is equidistant. However, with a fishnet grid, the Queen’s Case (above/below/right/left) neighbor’s centroids are N units away, while the centroids of the diagonal (Rook) neighbors are farther away (exactly the square root of 2 times N units away).
- Since the distance between centroids is the same in all six directions with hexagons, if you are using a distance band to find neighbors or are using the Optimized Hot Spot Analysis, Optimized Outlier Analysis or Create Space Time Cube By Aggregating Points tools, you will have more neighbors included in the calculations for each feature if you are using hexagonal grid as opposed to a fishnet grid.

Now, to again illustrate the differences of different approaches, let’s see what this map would look like with:
- rectangular binning:
ggplot(ma_spatial, aes(x = lng, y = lat)) +
annotation_map_tile() +
stat_bin2d(alpha=0.7) +
scale_fill_gradientn(colours = c("white","red"),
name = "Frequency")
## Zoom: 10

- hexagonal binning:
ggplot(ma_spatial, aes(x = lng, y = lat)) +
annotation_map_tile() +
stat_binhex(alpha=0.7) +
scale_fill_gradientn(colours = c("white","red"),
name = "Frequency")
## Zoom: 10

- a simple “heatmap” (we will discuss these more thoroughly next week):
ggplot(ma_spatial, aes(x = lng, y = lat)) +
annotation_map_tile() +
stat_density2d(aes(fill = ..level.., # value corresponding to discretized density estimates
alpha = ..level..),
geom = "polygon") + # creates the bands of differenc colors
## Configure the colors, transparency and panel
scale_fill_gradientn(colours = c("white","red"),
name = "Frequency")
## Zoom: 11

5.3.3 Homework 5.1
Look at the difference between the three maps (hex, rectangle, and density). How would your conclusions change if you were given these maps? Would you make different decisions about where to place your booths for the days of action? Why or why not? Discuss.
5.3.4 Multivariate binning
Multivariate binning is another binning method that lets you visualise slightly more complex data. In this method there can be many different variables consisting of different types of data. Like other binning methods the data is typically grouped with the sum or average of the data. Different types of symbology (such as size, shape and color) can also be used to represent this data as well.
We won’t be covering this here but just so you can have a look at some examples here.
5.3.5 Benefits of Binning
Because of the plethora of data types available and the wide variety of projects being done in GIS, binning is a popular method for mapping complex data and making it meaningful. Binning is a good option for map makers as well as users because it makes data easy to understand and it can be both static and interactive on many different map scales. If every different point were shown on a map it would have to be a very large scale map to ensure that the data points did not overlap and were easily understood by people using the maps.
According to Kenneth Field, an Esri Research Cartographer, “Data binning is a great alternative for mapping large point-based data sets which allows us to tell a better story without interpolation. Binning is a way of converting point-based data into a regular grid of polygons so that each polygon represents the aggregation of points that fall within it.”
By using binning to create categories of data maps are easier to understand, more accurate and more visually appealing.
Hexbin plots can be viewed as an alternative to scatter plots. The hexagon-shaped bins were introduced to plot densely packed sunflower plots. They can be used to plot scatter plots with high-density data.
5.4 A note of caution: MAUP
Now that we’ve shown you how to do a lot of spatial crime analysis, we wanted to close with some words of caution. Remember that everything you’ve learned here are just tools that you will be applying to data you are working with, but it’s up to you, the researcher, the analyst, the domain expert, to apply and use these with careful consideration and cautions. This discussion is very much part of spatial crime analysis, and an important field of thought.
I borrow here from George Renghert and Brian Lockwood:
When spatial analysis of crime is conducted, the analyst should not ignore the spatial units that data are aggregated into and the impact of this choice on the interpretation of findings. Just as several independent variables are considered to determine whether they have statistical significance, a consideration of multiple spatial units of analysis should be made as well, in order to determine whether the choice of aggregation level used in a spatial analysis can result in biased findings.
In particular, they highlight four main issues inherent in most studies of space: - issues associated with politically bounded units of aggregation, - edge effects of bounded space - the modifiable aerial unit problem (MAUP) - and ways in which the results of statistical analyses can be manipulated by changes in the level of aggregation.
In this lab we will focus on MAUP, but if you are interested in this kind of work, you should definitely read their paper to consider the other issues as well. There are techniques that can be used to alleviate each of the methodological difficulties, and they are described in accessible detail in their paper: Rengert, George F., and Brian Lockwood. “Geographical units of analysis and the analysis of crime.” Putting crime in its place. Springer, New York, NY, 2009. 109-122.
5.4.1 What is MAUP?
The Modifiable Areal Unit Problem (MAUP) is an important issue for those who conduct spatial analysis using units of analysis at aggregations higher than incident level. It is one of the better-known problems in geography and spatial analysis. This phenomenon illustrates both the need for considering space in one’s analysis, and the fundamental uncertainties that accompany real-world analysis.
The MAUP is "a problem arising from the imposition of artificial units of spatial reporting on continuous geographical phenomena, leading to artifacts or errors are created when one groups data into units for analysis.
The classic text on MAUP is the 1983 paper Openshaw, Stan. “The modifiable areal unit problem. CATMOG (Concepts and techniques in modern geography) 38.” Geo Abstracts, Norwich. 1984..
There are two distinct types of MAUP: Scale (i.e. determining the appropriate size of units for aggregation) and zone (i.e. drawing boundaries or grouping).
5.4.1.1 Scale
The scale problem involves results that change based on data that are analyzed at higher or lower levels of aggregation (Changing the number of units). For example, evaluating data at the state level vs. Census tract level.
The scale problem has moved to the forefront of geographical criminology as a result of the recent interest in small-scale geographical units of analysis. It has been suggested that smaller is better since small areas can be directly perceived by individuals and are likely to be more homogenous than larger areas. - Gerell, Manne. “Smallest is better? The spatial distribution of arson and the modifiable areal unit problem.” Journal of quantitative criminology 33.2 (2017): 293-318.
5.4.1.2 Zone
The zonal problem involves keeping the same scale of research (say, at the state level) but changing the actual shape and size of those areas.
The basic issue with the MAUP is that aggregate units of analysis are often arbitrarily produced by whom ever is in charge of creating the aggregate units. A classic example of this problem is known as Gerrymandering. Gerrymandering involves shaping and re-shaping voting districts based on the political affiliations of the resident citizenry.
The inherent problem with the MAUP and with situations such as Gerrymandering is that units of analysis are not based on geographic principles, and instead are based on political and social biases. For researchers and practitioners the MAUP has very important implications for research findings because it is possible that as arbitrarily defined units of analysis change shape findings based on these units will change as well.
When spatial data are derived from counting or averaging data within areal units, the form of those areal units affects the data recorded, and any statistical measures derived from the data. Modifying the areal units therefore changes the data. Two effects are involved: a zoning effect arising from the particular choice of areas at a given scale; and an aggregation effect arising from the extent to which data are aggregated over smaller or larger areas. The modifiable areal unit problem arises in part from edge effect.
If you’re interested, in particular about politics and voting, you can read this interesting piece to learn more about gerrymandering
5.4.2 Why does MAUP matter?
The practical implications of MAUP are immense for almost all decision-making processes involving GIS technology, since with the availability of aggregated maps, policy could easily focus on issues and problems which might look different if the aggregation scheme used were changed .
All studies based on geographical areas are susceptible to MAUP. The implications of the MAUP affect potentially any area level data, whether direct measures or complex model-based estimates. Here are a few examples of situations where the MAUP is expected to make a difference:
- The special case of the ecological fallacy is always present when Census area data are used to formulate and evaluate policies that address problems at individual level, such as deprivation. Also, it is recognised that a potential source of error in the analysis of Census data is ‘the arrangement of continuous space into defined regions for purposes of data reporting’
- The MAUP has an impact on indices derived from areal data, such as measures of segregation, which can change significantly as a result of using different geographical levels of analysis to derive composite measures .
- The choice of boundaries for reporting mortality ratios is not without consequences: when the areas are too small, the values estimated are unstable, while when the areas are too large, the values reported may be over-smoothed, i.e. meaningful variation may be lost .
- Gerell, Manne. “Smallest is better? The spatial distribution of arson and the modifiable areal unit problem.” Journal of quantitative criminology 33.2 (2017): 293-318.
5.4.3 What can we do?
Most often you will just have to remain aware of the MAUP and it’s possible effects. There are some techniques, that can help you address these issues, and the chapter pointed out at the beginning of this section is a great place to start to explore these. It is possible to use also an alternative, zone-free approach to mapping these crime patterns, perhaps by using kernel density estimation. Here we model the relative density of the points as a density surface - essentially a function of location (x,y) representing the relative likelihood of occurrence of an event at that point. We have covered KDE elsewhere in this course.
For the purposes of this course, it’s enough that you know of, and understand the MAUP and its implications. Always be smart when choosing your appropriate spatial unit of analysis, and when you use binning of any form, make sure you consider how and if your conclusions might change compared to another possible approach.
5.4.4 Homework 5.2
Look at the question for homework 5.1 about the three maps of binning and the hotspot map. Answer this question again, but now in light of what you have learned about MAUP.
5.5 Replacing polygons with grid or hex shapes
When you have meaningful spatial units of analysis in your polygons, for example you are interested specifically in Local Autorities, it might make sense to stick with what we did last week, and aggregate the points into these polygons to cheare thematic maps. However, while thematic maps are an accessible and visually appealing method for displaying spatial information, they can also be highly misleading. Irregularly shaped polygons and large differences in the size of areas being mapped can introduce misrepresentation. The message researchers want to get across might be lost, or even worse, misdirect the viewers to erroneous conclusions. This article provides a helpful discussion of the problem illustrating the case with UK election maps. It is worth reading.
Fortunately, there are many methods in R to enhance the legibility of geographic information and the interpretability of what it is trying to be communicated.
Broadly, the options are:
- cartogram
- hexmap
- grid
Selecting the appropriate method might depend on the research question being posed (e.g. clustering) and the data itself. Even once a method has been selected, there are different ways of operationalising them. We focus on two methods that have been developed recently: the geogrid package on CRAN and some [open code]https://rpubs.com/profrichharris/hexograms shared by Richard Harris to make grid and hex maps where instead of creating a grid, we reshape the existing polygons into grid or hex shapes. We will also speak about cartograms, but I’ll leave that to the next section.
Let’s explore this using the example of the results of the 2016 EU referendum at Local Authority level, where remain areas clustered in London. A simple thematic map does not necessarily communicate this well because Local Authorities are both small and densely populated in London.
You can download the full set of EU referendum result data as a csv from the Electoral Commission webside. Let’s read it straight into R:
eu_ref <- read.csv("https://www.electoralcommission.org.uk/__data/assets/file/0014/212135/EU-referendum-result-data.csv")
OKAY, now we need a shapefile to join it to. Remember when we got the Manchester lsoa shapefile with the boundary selector? Let’s go back, and this time get Local Authority Districts for England.
In this case that means select “English Districts, UAs and London Boroughs, 2011”:

Once you have the file, download, extract (unzip) and put the folder in your working directory. Mine is in a subfolder in my working directory called data, so I point R inside that folder to find my shape file.
las <- st_read("data/England_lad_2011_gen/england_lad_2011_gen.shp")
## Reading layer `england_lad_2011_gen' from data source `/Users/reka/Dropbox (The University of Manchester)/crimemapping_textbook_bookdown/data/England_lad_2011_gen/england_lad_2011_gen.shp' using driver `ESRI Shapefile'
## Simple feature collection with 326 features and 4 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 82644.8 ymin: 5349.399 xmax: 655976.9 ymax: 657599.5
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
We can now join the EU referendum data, as we have learned in the past weeks:
eu_sf <- left_join(las, eu_ref, by = c("name" = "Area"))
## Warning: Column `name`/`Area` joining factors with different levels,
## coercing to character vector
#make sure we are in British National Grid Projection
eu_sf <- st_transform(eu_sf, 27700)
Now we can have a look at these data:
ggplot() +
geom_sf(data = eu_sf, aes(fill = Pct_Leave))

We can see that in smaller LAs we don’t even really see the result, as the boundary lines pretty much cover everything. Hmm. Now what we can do is transform the shapes into squares of hexagons. Let’s have a go at squares.
We can use the functions in the geogrid package.
library(geogrid)
Now we can use the calculate_grid() function. This function, given an input multipolgyon spatial data frame this function calculates a hexagonal or regular grid, that strives to preserve the original geography. Once this is done, we assign each polygon in the original file to a new location in the gridded geometry. NOTE THIS WILL TAKE A WHILE again this is something that is quite computationally intensive, so you set it running, and go for a short break. You need short breaks in coding.
However this particular bit of code will actually take about 10 minutes. So if you’re about to leave, wait until you are home to run it.
#first make our sf object into an sp object
eu_sp <- as(eu_sf, 'Spatial')
#then use the calculate_grid function. Note how we specify grid type to be "regular".
eu_reg <- calculate_grid(shape = eu_sp, grid_type = "regular", seed = 1)
#assign the polygons
eu_reg <- assign_polygons(eu_sp, eu_reg)
#now turn it back into sf object for easy ggplot plotting
eu_reg <- st_as_sf(eu_reg)
While you’re waiting for that, you can have a read about how the polygons are assigned. The help file will tell you they use “the Hungarian algorithm”. This is from a paper called The Hungarian Method for the Assignment Problem by Harold W. Kuhn. Apparently, this method was so promising, that when Kuhn came across it referenced in a paper, even though the original paper is in Hungarian, Kuhn describes he “took out a Hungarian grammar and a large Hungarian-English dictionary and taught myself enough Hungarian to translate Egervary’s paper.” I can assure you, Hungarian is not an easy language to learn, so this must have been a most excellent theory to prompt such an act.
Now that your code has run, and you have your eu_reg shapefile, we can plot this!
ggplot() +
geom_sf(data = eu_reg, aes(fill = Pct_Leave))

Cool, eh?
5.5.1 Homework 5.3
Reproduce the above grid map using hexagons instead of squares.
5.6 Cartograms
The last thing we will do today is make some cartograms! Cartogram types of maps distort reality to convey information. They resize and exaggerate any variable on an attribute value.
There are different types of cartograms.
Density-equalizing (contiguous) cartograms are your traditional cartograms. In density-equalizing cartograms, map features bulge out a specific variable. Even though it distorts each feature, it remains connected during its creation. On the other hand, you can have Non-Contiguous Cartograms, where features in non-contiguous cartograms don’t have to stay connected. Finally, Dorling Cartogram (named after professor Danny Dorling) uses shapes like circles and rectangles to depict area. These types of cartograms make it easy to recognize patterns. For example, here is a Darling Cartogram I made of FixMyStreet reports in London:

Now we can make our own as well, using the cartogram package.
library(cartogram)
Within that there is the cartogram() function, which takes 2 arguments, 1 - the shape file (it can be a SpatialPolygonDataFrame or an sf object), and 2 - the variable which it should use to distort the polygon by.
In our data set we have a variable “electorate” which refers to the total number of registered electors
# construct a cartogram using the percentage voting leave
eu_cartogram <- cartogram(eu_sf, "Electorate")
Again this is some labour intensive work, much like the grid making, you have some time to chill now. Maybe read up on the maths behind this tranformation as well, in the paper Dougenik, J. A., Chrisman, N. R., & Niemeyer, D. R. (1985). An Algorithm To Construct Continuous Area Cartograms. In The Professional Geographer, 37(1), 75-81..
I do have a tip for you if you want to make sure the process does not take too long. You can set a parameter in the cartogram function which is the “itermax” parameter. This specifies the maximum number of iterations we are happy with. If you don’t specify it’s set to 15. Let’s set to 5 for the sake of speed:
# construct a cartogram using the percentage voting leave
eu_cartogram <- cartogram_cont(eu_sf, "Electorate", itermax = 5)
## Mean size error for iteration 1: 4.11884641548982
## Mean size error for iteration 2: 21.1096264523744
## Mean size error for iteration 3: 3.05848955161026
## Mean size error for iteration 4: 2.51480274961733
## Mean size error for iteration 5: 2.92812677999131
And if your cartogram has been created, you can now plot again the referendum results, but using the electorate to change the size of the locad authority:
ggplot() +
geom_sf(data = eu_cartogram, aes(fill = Pct_Leave))

We can now see London much better, and see that darker coloured cluster where much smaller percentage of people voted leave.
Okay that’s probably enough for the day. Nice work crime mappers!
5.7 References and further reading
5.7.1 Binning
- Johnson, Zachary Forest. (18 October 2011). “Hexbins!” Retrieved from: http://indiemaps.com/blog/2011/10/hexbins/ (8 August 2014).
- Smith, Nate. (25 May 2012). “Binning: An Alternative to Point Maps.” Mapbox. Retrieved from: https://www.mapbox.com/blog/binning-alternative-point-maps/ (8 August 2014).
- Claudia A Engel’s fantastic R-pub on Making Maps in R.
- Hexbin Graph Gallery
- US Drought Hexmap
- Hexbin with ggplot2
5.7.2 MAUP
- Gerell, Manne. “Smallest is better? The spatial distribution of arson and the modifiable areal unit problem.” Journal of quantitative criminology 33.2 (2017): 293-318.
- Openshaw, Stan. “The modifiable areal unit problem. CATMOG (Concepts and techniques in modern geography) 38.” Geo Abstracts, Norwich. 1984..
- Rengert, George F., and Brian Lockwood. “Geographical units of analysis and the analysis of crime.” Putting crime in its place. Springer, New York, NY, 2009. 109-122.
###Transforming polygons
- Waldo Tobler (2004) Thirty Five Years of Computer Cartograms, Annals of the Association of American Geographers, 94:1, 58-73, DOI: 10.1111/j.1467-8306.2004.09401004.x
- Langton, S.H. & Solymosi, R. (2018) ‘Visualising geographic information: examining methods of improving the thematic map.’ RPubs. Available: https://rpubs.com/langton_/visual_geography_study
#Studying spatial point patterns
5.8 What we’ll do today
We have now covered quite a bit. You’ve learnt about spatial objects and various formats in which they come and are stored by R, how to produce maps using a variety of packages, and also provided you with a brief introduction to common spatial operations. In what remains of the semester we are going to shift the emphasis and start focusing a bit more on spatial statistics. First we will focus on techniques that are used to explore and analyse points in a geographical space and in subsequent sessions we will cover techniques that are used to analyse spatial data when our unit of analysis are polygons (e.g., postal code areas, census areas, police beats, etc).
We will introduce a new R package called spatstat, that was developed for spatial point pattern analysis and modelling. It was written by Adrian Baddeley and Rolf Turner. There is a webpage dedicated to this package. The thickest book in my library, at 810 pages, is dedicated to this package. So as you can imagine the theory and practice of spatial pattern analysis is something one could devote an entire course to. You can get a pdf document used in a course the authors of this package develop here. In our course we are only going to provide you with an introductory practical entry into this field of techniques. If this package is not installed in your machine, make sure you install it before we carry on.
library(sf)
library(tmap)
library(dplyr)
library(spatstat)
5.9 Getting the data
We will using the crime data from Greater Manchester police we have been using so far. Let’s focus on burglary in the Fallowfield area. The code below has already been explained and used in previous sessions, so we won’t go over the detail again. But rather than cut and paste automatically, try to remember what each line of code is doing.
By the way, the police data for Manchester we have used in previous sessions correspond to only one month of the year. Here we are using a full year worth of data, so the data import will take a bit longer.
#Read a geojson file with Manchester wards
manchester_ward <- st_read("https://raw.githubusercontent.com/RUMgroup/Spatial-data-in-R/master/rumgroup/data/wards.geojson")
## Reading layer `OGRGeoJSON' from data source `https://raw.githubusercontent.com/RUMgroup/Spatial-data-in-R/master/rumgroup/data/wards.geojson' using driver `GeoJSON'
## Simple feature collection with 215 features and 12 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 351664 ymin: 381168.6 xmax: 406087.5 ymax: 421039.8
## epsg (SRID): 27700
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
#Create a new object that only has the city centre ward
df1 <- manchester_ward %>%
filter(wd16nm == "Fallowfield")
#Change coordinate systems
fallowfield <- st_transform(df1, 4326)
#Get rid of objects we no longer need
rm(manchester_ward)
rm(df1)
#Read Greater Manchester police data
crimes <- read.csv("https://raw.githubusercontent.com/jjmedinaariza/CrimeMapping/master/gmpcrime.csv")
burglary <- filter(crimes, crime_type == "Burglary")
#Transform the dataframe with crime information into a sf object
burglary_spatial = st_as_sf(burglary, coords = c("long", "lat"),
crs = 4326, agr = "constant")
#Select only the crimes that take place within the space defined by the Ward boundaries
# intersection
bur_fal <- st_intersects(fallowfield, burglary_spatial)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
# subsetting
bur_fal <- burglary_spatial[unlist(bur_fal),]
rm(crimes)
rm(burglary)
#Let's see the results
tm_shape(fallowfield) +
tm_fill() +
tm_shape(bur_fal) +
tm_dots()

In the point pattern analysis literature each point is often referred to as an event and these events can have marks, attributes or characteristics that are also encoded in the data. In our spatial object one of these marks is the type of crime (altough in this case it’s of little interest since we have filtered on it).
5.10 Getting the data into spatstat: the problem with duplicates
So let’s start using spatstat.The first thing we need to do is to transform our sf object into a ppp object which is how spatstat likes to store its point patterns. Unfortunately, spatstat and many other packages for analysis of spatial data precede sf, so the transformation is a bit awkard. Also before we do that, it is important to realise that a point pattern is defined as a series of events in a given area, or window, of observation. It is therefore extremely important to precisely define this window. In spatstat the function owin() is used to set the observation window. However, the standard function takes the coordinates of a rectangle or of a polygon from a matrix, and therefore it may be a bit tricky to use. Luckily the package maptools provides a way to transform a SpatialPolygons into an object of class owin, using the function as.owin
#First we transform our Falllowfield polygon into a sp object
fallowfield_sp <-as(fallowfield, "Spatial")
#Then we use the as.owin function to define the window
window <- maptools::as.owin.SpatialPolygons(fallowfield_sp)
#Check that this worked
class(window)
## [1] "owin"
window
## window: polygonal boundary
## enclosing rectangle: [-2.2581074, -2.2141921] x [53.43904, 53.45145] units
Now that we have created the window as an owin object let’s get the points:
#First we will extract the coordinates from our sf point data into a matrix
sf_bur_fal_coords <- matrix(unlist(bur_fal$geometry), ncol = 2, byrow = T)
#Then we use the ppp function to create the object using the information from our matrix and the window that we created
bur_ppp <- ppp(x = sf_bur_fal_coords[,1], y = sf_bur_fal_coords[,2],
window = window, check = T)
## Warning: data contain duplicated points
plot(bur_ppp)

Notice the warning message about duplicates. In spatial point pattern analysis an issue of significance is the presence of duplicates. The statistical methodology used for spatial point pattern processes is based largely on the assumption that processes are simple, that is, that the points cannot be coincident. That assumption may be unreasonable in many contexts (for example, the literature on repeat victimisation indeed suggests that we should expect the same households to be at a higher risk of being hit again). Even so the point (no pun intended) is that “when the data has coincidence points, some statistical procedures will be severely affected. So it is always strongly advisable to check for duplicate points and to decide on a strategy for dealing with them if they are present” (Baddeley et al., 2016: 60).
We can check the duplication in a ppp object with the following syntax:
any(duplicated(bur_ppp))
## [1] TRUE
To count the number of coincidence points we use the multiplicity() function. This will return a vector of integers, with one entry for each observation in our dataset, giving the number of points that are identical to the point in question (including itself).
multiplicity(bur_ppp)
If you want to know how many locations have more than one event you can use:
sum(multiplicity(bur_ppp) > 1)
## [1] 190
That’s quite something. 190 points out of 223 here share coordinates.
tm_shape(fallowfield) +
tm_fill() +
tm_shape(bur_fal) +
tm_dots(alpha=0.4, size=1)

In the case of crime, as we have hinted some of this may be linked to the nature of crime itself. Hint: repeat victimisation. However, this pattern of duplication is fairly obvious across all crime categories in the police.uk website and, although I have not explored this in detail, I strongly suspect it is a function of the anonimisation process used to create these maps. The coordinates provided in the open data are not the exact locations of crimes, but they come from a list of points generated for purposes of data publication. You can see the details here. This process is likely inflating the amount of duplication we observe. So keep in mind when analysing and working with this data that it is not the same as working with the real locations.
What to do about duplicates in spatial point pattern analysis is not always clear. You could simply delete the duplicates, but of course that may ignore issues such as repeat victimisation. You could also use jittering, which will add a small perturbation to the duplicate points so that they do not occupy the exact same space. Which again, may ignore things like repeat victimisation. Another alternative is to make each point “unique” and then attach the multiplicites of the points to the patterns as marks, as attributes of the points. Then you would need analytical techniques that take into account these marks.
If you were to be doing this for real you would want access to the real thing, not this public version of the data and then go for the second solution suggested above. We don’t have access to the source data, so for the sake of simplicity and so that we can illustrate how spatstat works we will add some jittering to the data. The first argument for the function is the object, retry asks whether we want the algorithm to have another go if the jittering places a point outside the window (we want this so that we don’t loose points), and the drop argument is used to ensure we get a ppp object as a result of running this function.
jitter_bur <- rjitter(bur_ppp, retry=TRUE, nsim=1, drop=TRUE)
plot(jitter_bur)

Notice the difference with the original plot. Can you see how the circumferences do not overlap perfectly now?
5.11 Inspecting our data with spatstat
This package supports all kind of exploratory point pattern analysis. One example of this is quadrant counting. One could divide the window of observation into quadrants and count the number of points into each of these quadrants. For example, if we want four quadrants along the X axis and 3 along the Y axis we could used those parameters in the quadratcount() function. Then we just use standard plotting functions from R base.
Q <- quadratcount(jitter_bur, nx = 4, ny = 3)
plot(jitter_bur)
plot(Q, add = TRUE, cex = 2)

In the video lectures by Luc Anselin would were exposed to the notion of complete spatial randomness (CSR). When we look at a point pattern process the first step in the process is whether it has been generated in a random manner. Under CSR, points are independent of each other and have the same propensity to be found at any location. We can generate data that conform to complete spatial randomness using the rpoispp() function. The r at the beginning is used to denote we are simulating data (you will this is common in R) and we are using a Poisson point process. Let’s generate 223 points in a random manner:
plot(rpoispp(223))

You will notice that the points in a homogeneous Poisson process are not ‘uniformly spread’: there are empty gaps and clusters of points. Run the previous command a few times. You will see the map generated is different each time.
In classical literature, the homogeneous Poisson process (CSR) is usually taken as the appropriate ‘null’ model for a point pattern. Our basic task in analysing a point pattern is to find evidence against CSR. We can run a Chi Square test to check this. So, for example:
quadrat.test(jitter_bur, nx = 3, ny = 2)
##
## Chi-squared test of CSR using quadrat counts
## Pearson X2 statistic
##
## data: jitter_bur
## X2 = 109.07, df = 5, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
## Quadrats: 6 tiles (irregular windows)
Observing the results we see that the p value is well below convential standards for rejection of the null hypothesis. Observing data as the one we have for burglary in Fallowfield would be extremely rare if the null hypothesis was true. We can then conclude that the burglary data is not randomly distributed in the observed space. But no cop nor criminologist would really question this. We do know that crime is not randomly distributed in space.
5.12 Density estimates
In the presentations by Luc Anselin and the recommended reading materials we introduced the notion of density maps. Kernel density estimation involves applying a function (known as a “kernel”) to each data point, which averages the location of that point with respect to the location of other data points.
ds <- density(jitter_bur)
class(ds)
## [1] "im"
plot(ds, main='Burglary density in Fallowfield')

The density function is estimating a kernel density estimate. Density is nothing but the number of points per unit area. This method computes the intensity continuously across the study area and the object return is a raster image. To perform this analysis in R we need to define the bandwidth of the density estimation, which basically determines the area of influence of the estimation. There is no general rule to determine the correct bandwidth; generally speaking if h is too small the estimate is too noisy, while if h is too high the estimate may miss crucial elements of the point pattern due to oversmoothing (Scott, 2009).
The key argument to pass to the density method for point patterm objects is sigma=, which determines the bandwidth of the kernel. In spatstat the functions bw.diggle, bw.ppl, and bw.scott can be used to estimate the bandwidth according to difference methods. The helpfiles recommend the use of the first two. These functions run algorithms that aim to select an appropriate bandwith.
bw.diggle(jitter_bur)
## sigma
## 4.249614e-05
bw.ppl(jitter_bur)
## sigma
## 0.0005474603
bw.scott(jitter_bur)
## sigma.x sigma.y
## 0.0040703690 0.0008433577
You can see the Diggle algorithm gives you the narrower bandwith. We can test how they work with our dataset using the following code:
par(mfrow=c(2,2))
plot(density.ppp(jitter_bur, sigma = bw.diggle(jitter_bur),edge=T),
main = paste("h = 0.000003"))
plot(density.ppp(jitter_bur, sigma = bw.ppl(jitter_bur),edge=T),
main=paste("h =0.0005"))
plot(density.ppp(jitter_bur, sigma = bw.scott(jitter_bur)[2],edge=T),
main=paste("h = 0.0008"))
plot(density.ppp(jitter_bur, sigma = bw.scott(jitter_bur)[1],edge=T),
main=paste("h = 0.004"))

Baddeley et (2016) suggest the use of the bw.ppl algorithm because in their experience it tends to produce the more appropriate values when the pattern consists predominantly of tight clusters. But they also insist that to detect a single tight cluster in the midst of random noise the bw.diggle method seems to work best.
5.13 Adding some context
Often it is convenient to use a basemap to provide context. In order to do that we first need to turn the image object generated by the spatstat package into a raster object, a more generic format for raster image used in R.
library(raster)
## Loading required package: sp
##
## Attaching package: 'raster'
## The following objects are masked from 'package:spatstat':
##
## area, rotate, shift
## The following object is masked from 'package:nlme':
##
## getData
## The following object is masked from 'package:dplyr':
##
## select
dmap1 <- density.ppp(jitter_bur, sigma = bw.ppl(jitter_bur),edge=T)
r1 <- raster(dmap1)
#remove very low density values
r1[r1 < 0.0001 ] <- NA
class(r1)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
Now that we have the raster we can add it to a basemap.
Two-dimensional RasterLayer objects (from the raster package) can be turned into images and added to Leaflet maps using the addRasterImage function.
The addRasterImage function works by projecting the RasterLayer object to EPSG:3857 and encoding each cell to an RGBA color, to produce a PNG image. That image is then embedded in the map widget.
It’s important that the RasterLayer object is tagged with a proper coordinate reference system. Many raster files contain this information, but some do not. Here is how you’d tag a raster layer object “r” which contains WGS84 data:
library(leaflet)
#make sure we have right CRS. We need the sp package for setting CRS to this raster:
library(sp)
crs(r1) <- sp::CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
#we also create a colour palet
pal <- colorNumeric(c("#0C2C84", "#41B6C4", "#FFFFCC"), values(r1),
na.color = "transparent")
#and then make map!
leaflet() %>%
setView(lng = -2.225814, lat = 53.441315, zoom = 14) %>%
addTiles() %>%
addRasterImage(r1, colors = pal, opacity = 0.8) %>%
addLegend(pal = pal, values = values(r1),
title = "Burglary map")
## Warning in colors(.): Some values were outside the color scale and will be
## treated as NA
And therw you have it. Perhaps those familiar with Fallowfield have some guesses as to what may be going on there?
5.13.1 Homework 1
Ok, so see if you can do something like what we have done today, but for violent crime in the city centre. Produce the density estimates and then plot the density plot. In addition add a layer of points with the licenced premises we looked at last week.
5.13.2 Homework 2
Produce a kernel density estimate for burglary across the whole of the city. Where is burglary more concentrated?
#Global and local spatial autocorrelation
This session we begin to …
library(sf)
library(tmap)
library(dplyr)
library(sp)
library(spdep)
5.14 Get data
So, let’s start by getting some data. We are going to take some of the data from past weeks. In getting the data ready you will have one more opportunity to practice how to read data into R but also how to perform some basic spatial checks, transformations and operations. It may seem like you have already done some of this stuff. But that’s the point: to force you to practice. The more you do this stuff, the easier it will be and -trust us- eventually things will click and become second nature. First let’s get the LSOA boundary data.
shp_name <- "data/BoundaryData/england_lsoa_2011.shp"
manchester_lsoa <- st_read(shp_name)
## Reading layer `england_lsoa_2011' from data source `/Users/reka/Dropbox (The University of Manchester)/crimemapping_textbook_bookdown/data/BoundaryData/england_lsoa_2011.shp' using driver `ESRI Shapefile'
## Simple feature collection with 282 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 378833.2 ymin: 382620.6 xmax: 390350.2 ymax: 405357.1
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
Now check the coordinate system.
st_crs(manchester_lsoa)
## Coordinate Reference System:
## No EPSG code
## proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs"
There is no EPSG code assigned, but notice the datum is given for BNG. Let’s address this.
lsoa_WGS84 <- st_transform(manchester_lsoa, 4326)
st_crs(lsoa_WGS84)
## Coordinate Reference System:
## EPSG: 4326
## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
plot(st_geometry(lsoa_WGS84))

rm(manchester_lsoa)
Let’s add the burglary data from Greater Manchester. We have practiced this code in previous sessions so we won’t go over it on detail again, but try to remember and understand what each line of code rather than blindly cut and paste. If you don’t understand what each of these lines of codes is doing call us out.
#Read into R
crimes <- read.csv("https://raw.githubusercontent.com/jjmedinaariza/CrimeMapping/master/gmpcrime.csv")
#Filter out to select burglary
burglary <- filter(crimes, crime_type == "Burglary")
#Transform into spatial object
burglary_spatial = st_as_sf(burglary, coords = c("long", "lat"),
crs = 4326, agr = "constant")
#Remove redundant non spatial burglary object
rm(burglary)
rm(crimes)
#Select burglaries that intersect with the Manchester city LSOA map.
bur_mc <- st_intersects(lsoa_WGS84, burglary_spatial)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
bur_mc <- burglary_spatial[unlist(bur_mc),]
#Check results
plot(st_geometry(bur_mc))

#Remove redundant objects
rm(burglary_spatial)
We have the burglaries, let’s now count how many burglaries there is within each LSOA polygon. This is a point in polygon operation that we cover a couple of weeks ago. If the code or the notion does not make much sense to you make sure you review the relevant session from week 4.
#Point in polygon spatial operation
burglaries_per_lsoa <- bur_mc %>%
st_join(lsoa_WGS84, ., left = FALSE) %>%
count(code)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
#Let's rename the column with the count of burglaries (n) into something more meaningful
burglaries_per_lsoa <- rename(burglaries_per_lsoa, burglary = n)
#Plot with tmap
tm_shape(burglaries_per_lsoa) +
tm_fill("burglary", style = "quantile", palette = "Reds") +
tm_borders(alpha = 0.1) +
tm_layout(main.title = "Burglary counts", main.title.size = 0.7 ,
legend.position = c("right", "bottom"), legend.title.size = 0.8)

Do you see any patterns? Are burglaries randomly spread around the map? Or would you say that areas that are closer to each other tend to be more alike? Is there evidence of clustering? Do burglaries seem to appear in certain pockets of the map? In this session we are going to discuss ways in which you can quantify the answer to this question. We will discuss measures of global spatial autocorrelation, which essentially aim to answer the degree to which areas that are near each other tend to be more alike. We say global because we are interested in the degree of clustering not on the location of the clusters. Later we will also cover techniques to identify local clusters of autocorrelation, but for now we will focus in quantifying whether areas are (on average) alike their neighbours.
5.15 What is a neighbour?
Previously I asked whether areas are alike their neighbours or to areas that are close. But what is a neighbour? Or what do we mean by close? How can one define a set of neighbours for each area? If we want to know if what we measure in a particular area is similar to what happens on its neighbouring areas, we need to establish what we mean by a neighbour.
There are various ways of defining a neighbour. We can say that two areas are neighbours if they share boundaries, if they are next to each other. In this case we talk of neighbours by contiguity. By contiguous you can, at the same time, mean all areas that share common boundaries (what we call contiguity using the rook criteria, like in chess) or areas that share common boundaries and common corners, that is, that have any point in common (and we call this contiguity using the queen criteria).
When defining neighbours by contiguity we may also specify the order of contiguity. First order contiguity means that we are focusing on areas immediately contiguous. Second order means that we consider neighbours only those areas that are immediately contiguous to our first order neighbours (only the yellow areas in the figure below) and you could go on and on. Look at the graphic below for clarification:
Figure 1
Alternatively we may define neighbours by distance. You can consider neighbours those areas that distant-wise are close to each other (regardless of whether boundaries are shared). In other words, areas will be defined as neighbours if they are within a specified radius.
In sum, adjacency is an important concept in some spatial analysis. In some cases objects are considered ajacent when they “touch”, e.g. neighboring countries. Contiguity measures tend to be more common when studying areas. It can also be based on distance. This is the most common approach when analyzing point data, but can also be relevant when studying areas.
You will come across the term spatial weight matrix at some point or, using mathematical notation, W. Essentially the spatial weight matrix is a n by n matrix with ones and zeroes (in the case of contiguity based definitions) identifying if any two observations are neighbours. So you can think of the spatial weight matrix as the new data table that we are constructing with our definition of neighbours, whichever this is.
How do you build such a matrix with R? Well, let’s turn to that. But to make things a bit simpler, let’s focus not on the whole of Manchester, but just in the LSOAs within the city centre. We will use familiar code to clip the spatial object with the counts of burglaries to only those that intersect with the City Centre ward. Again, we have covered this code elsewhere, so we won’t explain here in detail. But don’t just cut and paste, if there is anything in this code you don’t fully understand you are expected to ask us.
#Read a geojson file with Manchester wards
manchester_ward <- st_read("https://raw.githubusercontent.com/RUMgroup/Spatial-data-in-R/master/rumgroup/data/wards.geojson")
## Reading layer `OGRGeoJSON' from data source `https://raw.githubusercontent.com/RUMgroup/Spatial-data-in-R/master/rumgroup/data/wards.geojson' using driver `GeoJSON'
## Simple feature collection with 215 features and 12 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 351664 ymin: 381168.6 xmax: 406087.5 ymax: 421039.8
## epsg (SRID): 27700
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
#Create a new object that only has the city centre ward
df1 <- manchester_ward %>%
filter(wd16nm == "City Centre")
#Change coordinate systems
cc_ward <- st_transform(df1, 4326)
#Check if they match those of the imd_gm object
st_crs(cc_ward) == st_crs(burglaries_per_lsoa)
## [1] TRUE
#Get rid of objects we no longer need
rm(manchester_ward)
rm(df1)
#Intersect
cc_intersects <- st_intersects(cc_ward, burglaries_per_lsoa)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
cc_burglary <- burglaries_per_lsoa[unlist(cc_intersects),]
#Plot with tmap
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(cc_burglary) +
tm_fill("burglary", style = "quantile", palette = "Reds", id="code") +
tm_borders() +
tm_layout(main.title = "Burglary counts", main.title.size = 0.7 ,
legend.position = c("right", "top"), legend.title.size = 0.8)
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
So now we have a new spatial object cc_burglary with the 23 LSOA units that compose the City Centre of Manchester. By focusing in a smaller subset of areas we can understand perhaps a bit better what comes next. But again we carry on. Do you perceive here some degree of spatial autocorrelation?
5.15.1 Homework 1
The id argument in the tm_fill ensures that when you click over any of the areas you get not only the count of burglaries in that LSOA (the quantity we are mapping) gets displayed within a bubble, but you also get to see the code that identifies that LSOA.
Move your cursor over the LSOA covering Beswick. You will see this area had 58 burglaries in 2017 and that the LSOA identifier is E01033688. Using the rook criteria identify the first order neighbors of this LSOA. List their identifiers. Are things different if we use the queen criteria? If so, how does it change? Now. Think and think hard about what the lecture by Luc Anseling discussed. Have you identified all the neighbours of this area? (there are multiple ways of answering this question, just make sure you reason your answer).
5.16 Creating a list of neighbours
It would be very, very tedious having to identify the neighbours of all the areas in our study area in the way we have done in homework 1. That’s why we love computers. We can automate tedious work so that they do it and we have more time to do fun stuff. We can use code to get the computer to establish what areas are next to each other (if we are using a contiguity based definition of being a neighbour).
We have already discussed how sf is new kid in town and although package developers are quickly trying to adapt their packages to work with sf, many of the existing spatial packages still expect sp objects. So to illustrate the concepts in this week session we are first going to turn our sf object into sp.
#Before we do that I want to turn the factor identifiying the LSOAs into a character vector (since I will need this later on)
cc_burglary$lsoa_code <- as.character(cc_burglary$code)
#We coerce the sf object into a new sp object that we are calling bur_ccsp (remember names are arbitrary)
bur_ccsp <- as(cc_burglary, "Spatial")
In order to identify neighbours we will use the poly2nb() function from the spdep package that we loaded at the beginning of our session. The spdep package provides basic functions for building neighbour lists and spatial weights, tests for spatial autocorrelation for areal data like Moran’s I, and functions for fitting spatial regression models.
This function builds a neigbours list based on regions with contiguous boundaries. If you look at the documentation you will see that you can pass a “queen” argument that takes TRUE or FALSE as options.If you do not specify this argument the default is set to true, that is, if you don’t specify queen = FALSE this function will return a list of first order neighbours using the Queen criteria.
w <- poly2nb(bur_ccsp, row.names=bur_ccsp$lsoa_code)
class(w)
## [1] "nb"
This has created a nb, neighbour list object. We can get some idea of what’s there if we ask for a summary.
summary(w)
## Neighbour list object:
## Number of regions: 23
## Number of nonzero links: 100
## Percentage nonzero weights: 18.90359
## Average number of links: 4.347826
## Link number distribution:
##
## 2 3 4 5 6 7 8
## 3 4 6 5 3 1 1
## 3 least connected regions:
## E01033673 E01033674 E01033684 with 2 links
## 1 most connected region:
## E01033658 with 8 links
This is basically telling us that using this criteria each LSOA polygon has an average of 4.3 neighbours (when we just focus on the city centre) and that all areas have some neighbours (there is no islands). The link number distribution gives you the number of links (neighbours) per area. So here we have 3 polygons with 2 neighbours, 3 with 3, 6 with 4, and so on. The summary function here also identifies the areas sitting at both extreme of the distribution.
For more details we can look at the structure of w.
str(w)
## List of 23
## $ : int [1:4] 2 6 11 23
## $ : int [1:5] 1 3 4 6 8
## $ : int [1:5] 2 5 6 8 12
## $ : int [1:3] 2 8 21
## $ : int [1:7] 3 6 9 10 12 14 18
## $ : int [1:6] 1 2 3 5 10 11
## $ : int [1:3] 9 13 17
## $ : int [1:6] 2 3 4 12 18 21
## $ : int [1:8] 5 7 13 14 17 18 19 20
## $ : int [1:4] 5 6 11 14
## $ : int [1:5] 1 6 10 22 23
## $ : int [1:4] 3 5 8 18
## $ : int [1:3] 7 9 14
## $ : int [1:4] 5 9 10 13
## $ : int [1:4] 16 19 20 21
## $ : int [1:2] 15 19
## $ : int [1:2] 7 9
## $ : int [1:6] 5 8 9 12 20 21
## $ : int [1:4] 9 15 16 20
## $ : int [1:5] 9 15 18 19 21
## $ : int [1:5] 4 8 15 18 20
## $ : int [1:2] 11 23
## $ : int [1:3] 1 11 22
## - attr(*, "class")= chr "nb"
## - attr(*, "region.id")= chr [1:23] "E01005065" "E01005066" "E01005128" "E01005212" ...
## - attr(*, "call")= language poly2nb(pl = bur_ccsp, row.names = bur_ccsp$lsoa_code)
## - attr(*, "type")= chr "queen"
## - attr(*, "sym")= logi TRUE
We can graphically represent the links using the following code:
#We first plot the boundaries
plot(bur_ccsp, col='gray', border='blue', lwd=2)
#Then we use the coordinates function to obtain the coordinates of the polygon centroids
xy <- coordinates(bur_ccsp)
#Then we draw lines between the polygons centroids for neighbours that are listed as linked in w
plot(w, xy, col='red', lwd=2, add=TRUE)

5.17 Generating the weight matrix
We can transform w into a spatial weights matrix. A spatial weights matrix reflects the intensity of the geographic relationship between observations. For this we use the spdep function nb2mat().
wm <- nb2mat(w, style='B')
wm
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## E01005065 0 1 0 0 0 1 0 0 0 0 1 0
## E01005066 1 0 1 1 0 1 0 1 0 0 0 0
## E01005128 0 1 0 0 1 1 0 1 0 0 0 1
## E01005212 0 1 0 0 0 0 0 1 0 0 0 0
## E01033653 0 0 1 0 0 1 0 0 1 1 0 1
## E01033654 1 1 1 0 1 0 0 0 0 1 1 0
## E01033655 0 0 0 0 0 0 0 0 1 0 0 0
## E01033656 0 1 1 1 0 0 0 0 0 0 0 1
## E01033658 0 0 0 0 1 0 1 0 0 0 0 0
## E01033659 0 0 0 0 1 1 0 0 0 0 1 0
## E01033661 1 0 0 0 0 1 0 0 0 1 0 0
## E01033662 0 0 1 0 1 0 0 1 0 0 0 0
## E01033664 0 0 0 0 0 0 1 0 1 0 0 0
## E01033667 0 0 0 0 1 0 0 0 1 1 0 0
## E01033672 0 0 0 0 0 0 0 0 0 0 0 0
## E01033673 0 0 0 0 0 0 0 0 0 0 0 0
## E01033674 0 0 0 0 0 0 1 0 1 0 0 0
## E01033677 0 0 0 0 1 0 0 1 1 0 0 1
## E01033681 0 0 0 0 0 0 0 0 1 0 0 0
## E01033682 0 0 0 0 0 0 0 0 1 0 0 0
## E01033683 0 0 0 1 0 0 0 1 0 0 0 0
## E01033684 0 0 0 0 0 0 0 0 0 0 1 0
## E01033688 1 0 0 0 0 0 0 0 0 0 1 0
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## E01005065 0 0 0 0 0 0 0 0 0 0
## E01005066 0 0 0 0 0 0 0 0 0 0
## E01005128 0 0 0 0 0 0 0 0 0 0
## E01005212 0 0 0 0 0 0 0 0 1 0
## E01033653 0 1 0 0 0 1 0 0 0 0
## E01033654 0 0 0 0 0 0 0 0 0 0
## E01033655 1 0 0 0 1 0 0 0 0 0
## E01033656 0 0 0 0 0 1 0 0 1 0
## E01033658 1 1 0 0 1 1 1 1 0 0
## E01033659 0 1 0 0 0 0 0 0 0 0
## E01033661 0 0 0 0 0 0 0 0 0 1
## E01033662 0 0 0 0 0 1 0 0 0 0
## E01033664 0 1 0 0 0 0 0 0 0 0
## E01033667 1 0 0 0 0 0 0 0 0 0
## E01033672 0 0 0 1 0 0 1 1 1 0
## E01033673 0 0 1 0 0 0 1 0 0 0
## E01033674 0 0 0 0 0 0 0 0 0 0
## E01033677 0 0 0 0 0 0 0 1 1 0
## E01033681 0 0 1 1 0 0 0 1 0 0
## E01033682 0 0 1 0 0 1 1 0 1 0
## E01033683 0 0 1 0 0 1 0 1 0 0
## E01033684 0 0 0 0 0 0 0 0 0 0
## E01033688 0 0 0 0 0 0 0 0 0 1
## [,23]
## E01005065 1
## E01005066 0
## E01005128 0
## E01005212 0
## E01033653 0
## E01033654 0
## E01033655 0
## E01033656 0
## E01033658 0
## E01033659 0
## E01033661 1
## E01033662 0
## E01033664 0
## E01033667 0
## E01033672 0
## E01033673 0
## E01033674 0
## E01033677 0
## E01033681 0
## E01033682 0
## E01033683 0
## E01033684 1
## E01033688 0
## attr(,"call")
## nb2mat(neighbours = w, style = "B")
This matrix has values of 0 or 1 indicating whether the elements listed in the rows are adjacent (using our definition, which in this case was the Queen criteria) with each other. The diagonal is full of zeroes. An area cannot be a neighbour of itself. So, if you look at the first two and the second column you see a 1. That means that the LSOA with the code E01005065 is a neighbour of the second LSOA (as listed in the rows) which is E01005066. You will zeroes for many of the other columns because this LSOA only has 4 neighbours.
5.17.1 Homework 2
Looking at this matrix identify the other 3 neigbours of E01005065
In many computations we will see that the matrix is row standardised. We can obtain a row standardise matrix changing the code:
wm_rs <- nb2mat(w, style='W')
wm_rs
## [,1] [,2] [,3] [,4] [,5] [,6]
## E01005065 0.0000000 0.2500000 0.0000000 0.0000000 0.0000000 0.2500000
## E01005066 0.2000000 0.0000000 0.2000000 0.2000000 0.0000000 0.2000000
## E01005128 0.0000000 0.2000000 0.0000000 0.0000000 0.2000000 0.2000000
## E01005212 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000
## E01033653 0.0000000 0.0000000 0.1428571 0.0000000 0.0000000 0.1428571
## E01033654 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.0000000
## E01033655 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033656 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000
## E01033658 0.0000000 0.0000000 0.0000000 0.0000000 0.1250000 0.0000000
## E01033659 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000
## E01033661 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000 0.2000000
## E01033662 0.0000000 0.0000000 0.2500000 0.0000000 0.2500000 0.0000000
## E01033664 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033667 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.0000000
## E01033672 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033673 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033674 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033677 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.0000000
## E01033681 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033682 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033683 0.0000000 0.0000000 0.0000000 0.2000000 0.0000000 0.0000000
## E01033684 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033688 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [,7] [,8] [,9] [,10] [,11] [,12]
## E01005065 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.0000000
## E01005066 0.0000000 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01005128 0.0000000 0.2000000 0.0000000 0.0000000 0.0000000 0.2000000
## E01005212 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000
## E01033653 0.0000000 0.0000000 0.1428571 0.1428571 0.0000000 0.1428571
## E01033654 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000
## E01033655 0.0000000 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000
## E01033656 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667
## E01033658 0.1250000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033659 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.0000000
## E01033661 0.0000000 0.0000000 0.0000000 0.2000000 0.0000000 0.0000000
## E01033662 0.0000000 0.2500000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033664 0.3333333 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000
## E01033667 0.0000000 0.0000000 0.2500000 0.2500000 0.0000000 0.0000000
## E01033672 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033673 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033674 0.5000000 0.0000000 0.5000000 0.0000000 0.0000000 0.0000000
## E01033677 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.1666667
## E01033681 0.0000000 0.0000000 0.2500000 0.0000000 0.0000000 0.0000000
## E01033682 0.0000000 0.0000000 0.2000000 0.0000000 0.0000000 0.0000000
## E01033683 0.0000000 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033684 0.0000000 0.0000000 0.0000000 0.0000000 0.5000000 0.0000000
## E01033688 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333 0.0000000
## [,13] [,14] [,15] [,16] [,17] [,18] [,19]
## E01005065 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01005066 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01005128 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01005212 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01033653 0.0000000 0.1428571 0.00 0.00 0.0000000 0.1428571 0.000
## E01033654 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01033655 0.3333333 0.0000000 0.00 0.00 0.3333333 0.0000000 0.000
## E01033656 0.0000000 0.0000000 0.00 0.00 0.0000000 0.1666667 0.000
## E01033658 0.1250000 0.1250000 0.00 0.00 0.1250000 0.1250000 0.125
## E01033659 0.0000000 0.2500000 0.00 0.00 0.0000000 0.0000000 0.000
## E01033661 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01033662 0.0000000 0.0000000 0.00 0.00 0.0000000 0.2500000 0.000
## E01033664 0.0000000 0.3333333 0.00 0.00 0.0000000 0.0000000 0.000
## E01033667 0.2500000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01033672 0.0000000 0.0000000 0.00 0.25 0.0000000 0.0000000 0.250
## E01033673 0.0000000 0.0000000 0.50 0.00 0.0000000 0.0000000 0.500
## E01033674 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01033677 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01033681 0.0000000 0.0000000 0.25 0.25 0.0000000 0.0000000 0.000
## E01033682 0.0000000 0.0000000 0.20 0.00 0.0000000 0.2000000 0.200
## E01033683 0.0000000 0.0000000 0.20 0.00 0.0000000 0.2000000 0.000
## E01033684 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## E01033688 0.0000000 0.0000000 0.00 0.00 0.0000000 0.0000000 0.000
## [,20] [,21] [,22] [,23]
## E01005065 0.0000000 0.0000000 0.0000000 0.25
## E01005066 0.0000000 0.0000000 0.0000000 0.00
## E01005128 0.0000000 0.0000000 0.0000000 0.00
## E01005212 0.0000000 0.3333333 0.0000000 0.00
## E01033653 0.0000000 0.0000000 0.0000000 0.00
## E01033654 0.0000000 0.0000000 0.0000000 0.00
## E01033655 0.0000000 0.0000000 0.0000000 0.00
## E01033656 0.0000000 0.1666667 0.0000000 0.00
## E01033658 0.1250000 0.0000000 0.0000000 0.00
## E01033659 0.0000000 0.0000000 0.0000000 0.00
## E01033661 0.0000000 0.0000000 0.2000000 0.20
## E01033662 0.0000000 0.0000000 0.0000000 0.00
## E01033664 0.0000000 0.0000000 0.0000000 0.00
## E01033667 0.0000000 0.0000000 0.0000000 0.00
## E01033672 0.2500000 0.2500000 0.0000000 0.00
## E01033673 0.0000000 0.0000000 0.0000000 0.00
## E01033674 0.0000000 0.0000000 0.0000000 0.00
## E01033677 0.1666667 0.1666667 0.0000000 0.00
## E01033681 0.2500000 0.0000000 0.0000000 0.00
## E01033682 0.0000000 0.2000000 0.0000000 0.00
## E01033683 0.2000000 0.0000000 0.0000000 0.00
## E01033684 0.0000000 0.0000000 0.0000000 0.50
## E01033688 0.0000000 0.0000000 0.3333333 0.00
## attr(,"call")
## nb2mat(neighbours = w, style = "W")
Row standardisation of a matrix ensure that the sum of the rows adds up to 1. So, for example, if you have four neighbours and that has to add up to 4, you need to divide 1 by 4, which gives you 0.25. So in the columns for a polygon with 4 neighbours you will see 0.25 in the column representing each of the neighbours.
5.18 Compute Moran’s I
The most well known measure of spatial autocorrelation is the Moran’s I. It was developed by Patrick Alfred Pierce Moran, an Australian statistician. You can find the formula and some explanation in the wikepedia article. The video lecture by Luc Anselin covers an explanation of Moran’s I. We strongly recommend you watch the video. You can also find helpful this link if things are still unclear. The formula you see may look intimidating but it is nothing but the formula for standard correlation expanded to incorporate the spatial weight matrix.
Before we can use the functions from spdep to compute the global Moran’s I we need to create a ‘listw’ type spatial weights object (instead of the matrix we used above). To get the same value as above we use “style=’B’” to use binary (TRUE/FALSE) distance weights.
(try row standardising)
ww <- nb2listw(w, style='B')
ww
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 23
## Number of nonzero links: 100
## Percentage nonzero weights: 18.90359
## Average number of links: 4.347826
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 23 529 100 200 1960
Now we can use the moran function. Have a look at ?moran. The function is defined as ‘moran(y, ww, n, Szero(ww))’. Note the odd arguments n and S0. I think they are odd, because “ww” has that information. Anyway, we supply them and it works. There probably are cases where it makes sense to use other values.
moran(bur_ccsp$burglary, ww, n=length(ww$neighbours), S0=Szero(ww))
## $I
## [1] 0.1203294
##
## $K
## [1] 7.964184
So the Moran’s I here is 0.12, which is not very large. But still. We can now run a test of statistical significance for this statistic.
The Spatial Autocorrelation (Global Moran’s I) tool is an inferential statistic, which means that the results of the analysis are always interpreted within the context of its null hypothesis. For the Global Moran’s I statistic, the null hypothesis states that the attribute being analyzed is randomly distributed among the features in your study area; said another way, the spatial processes promoting the observed pattern of values is random chance. Imagine that you could pick up the values for the attribute you are analyzing and throw them down onto your features, letting each value fall where it may. This process (picking up and throwing down the values) is an example of a random chance spatial process. When the p-value returned by this tool is statistically significant, you can reject the null hypothesis.
In some software you can use statistical tests invoking asymptotic theory, but the only appropriate way of doing these tests is by using a Monte Carlo procedure. The way Monte Carlo works is that the values of burglary are randomly assigned to the polygons, and the Moran’s I is computed. This is repeated several times to establish a distribution of expected values. The observed value of Moran’s I is then compared with the simulated distribution to see how likely it is that the observed values could be considered a random draw.
If confused, watch this quick video on monte carlo simulations.
We use the function moran.mc() to run a permutation test for Moran’s I statistic calculated by using some number of random permutations of our numeric vector, for the given spatial weighting scheme, to establish the rank of the observed statistic in relation to the simulated values.
We need to specify our variable of interest (burglary), the listw object we created earlier (ww), and the number of permutations we want to run (here we choose 99).
set.seed(1234) # The seed number you choose is the starting point used in the generation of a sequence of random numbers, which is why (provided you use the same pseudo-random number generator) you'll obtain the same results given the same seed number.
burg_moranmc_results <- moran.mc(bur_ccsp$burglary, ww, nsim=99)
burg_moranmc_results
##
## Monte-Carlo simulation of Moran I
##
## data: bur_ccsp$burglary
## weights: ww
## number of simulations + 1: 100
##
## statistic = 0.12033, observed rank = 93, p-value = 0.07
## alternative hypothesis: greater
So, the probability of observing this Moran’s I if the null hypothis was true is 0.07. This is higher than our alpha level of 0.05. In this case, we can conclude that there isn’t a significant global spatial autocorrelation.
We can make a “Moran scatter plot” to visualize spatial autocorrelation. Note the row standardisation of the weights matrix.
rwm <- mat2listw(wm, style='W')
# Checking if rows add up to 1 (they should)
mat <- listw2mat(rwm)
#This code is simply adding each row to see if we get one when we add their values up, we are only displaying the first 15 rows in the matrix
apply(mat, 1, sum)[1:15]
## E01005065 E01005066 E01005128 E01005212 E01033653 E01033654 E01033655
## 1 1 1 1 1 1 1
## E01033656 E01033658 E01033659 E01033661 E01033662 E01033664 E01033667
## 1 1 1 1 1 1 1
## E01033672
## 1
Now we can plot:
moran.plot(bur_ccsp$burglary, rwm)

The X axis represents the values of our burglary variable in each unit (each LSOA) and the Y axis represents a spatial lag of this variable. A spatial lag in this context is simply the average value of the burglary count in the areas that are considered neighours of each LSOA. So we are plotting the value of burglary against the average value of burglary in the neighbours. And you can see the correlation is almost flat here. As with any correlation measure, you could get positive spatial autocorrelation, that would mean that as you move further to the right in the X axis you have higher levels of burglary in the surrounding area. This is what we see here. But the correlation is fairly low and as we saw is not statistically significant. You can also obtain negative spatial autocorrelation. That is, that would mean that areas with high level of crime tend (it’s all about the global average effect!) to be surrounded by areas with low levels of crime. This is clearly not what we see here.
It is very important to understand that global statistics like the spatial autocorrelation (Global Moran’s I) tool assess the overall pattern and trend of your data. They are most effective when the spatial pattern is consistent across the study area. In other words, you may have clusters (local pockets of autocorrelation), without having clustering (global autocorrelation). This may happen if the sign of the clusters negate each other.
But don’t just take our word for how important this is, or how it’s commonly applied in criminological research. Instead, now that you’ve gone through on how to do this, and have begun to get a sense of understanding, read the following paper on https://link.springer.com/article/10.1023/A:1007592124641 where the authors make use of Moran’s I to explain spatial characteristics of homicide. You will likely see this in other papers as well, and now you will know what it means and why it’s important.
5.18.1 Homework 3 (Optional Practice)
You know what is coming, don’t you? Yes, you need to compute the Moran’s I for burglary again. But this time, you need to do it for the whole of Manchester city. I won’t mark you down if you don’t try this. But why wouldn’t you?
5.19 Week 7 : Local spatial autocorrelation
5.20 Prelude
Last week we learned about global measures of spatial association, in particular the Moran’s I statistic, which provide a mechanism to make inferences about a population from a sample. While this is useful for us to be able to assess quantitatively whether crime events cluster in a non-random manner, in the words of Jerry Ratcliffe “this simply explains what most criminal justice students learn in their earliest classes.” For example, consider the study of robberies in Philadelphia:

Aggregated to the police districts, this returns a global Moran’s I value (range 1 to 1) of 0.56, which suggests that police sectors with high robbery counts adjoin sectors that also have high robbery counts, and low crime sectors are often neighbors of other low crime sectors, something that should hardly be surprising given the above map (Ratcliffe, Jerry. “Crime mapping: spatial and temporal challenges.” Handbook of quantitative criminology. Springer, New York, NY, 2010. 5-24.).
In today’s session we will learn about local indicators of spatial association (LISA) and show how they allow for the decomposition of global indicators, such as Moran’s I, into the contribution of each observation. The LISA statistics serve two purposes. On one hand, they may be interpreted as indicators of local pockets of nonstationarity, or hot spots. On the other hand, they may be used to assess the influence of individual locations on the magnitude of the global statistic and to identify “outliers” (Anselin, Luc. “Local indicators of spatial association—LISA.” Geographical analysis 27.2 (1995): 93-115.).
These are the packages we will use:
library(tmap)
library(dplyr)
library(sp)
library(sf)
library(spdep)
5.21 Get data and built the weight matrix
For this week we will be going back to the data from last week about burglary in Manchester city. Below is the code to build this. This is code we have already used and covered in previous sessions. If any of this is confusing, or you need a reminder just refer to last week’s notes!
Get the boundary data:
shp_name <- "data/BoundaryData/england_lsoa_2011.shp"
manchester_lsoa <- st_read(shp_name)
## Reading layer `england_lsoa_2011' from data source `/Users/reka/Dropbox (The University of Manchester)/crimemapping_textbook_bookdown/data/BoundaryData/england_lsoa_2011.shp' using driver `ESRI Shapefile'
## Simple feature collection with 282 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 378833.2 ymin: 382620.6 xmax: 390350.2 ymax: 405357.1
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
Transform the CRS to WGS84:
lsoa_WGS84 <- st_transform(manchester_lsoa, 4326)
Read the crimes data into R:
crimes <- read.csv("https://raw.githubusercontent.com/jjmedinaariza/CrimeMapping/master/gmpcrime.csv")
Subset the data:Filter out to select burglary:
burglary <- filter(crimes, crime_type == "Burglary")
Of course this is just a dataframe, and not (yet) a spatial object. To map, we must first transform into spatial object:
burglary_spatial <- st_as_sf(burglary, coords = c("long", "lat"),
crs = 4326, #set the CRS to WGS84
agr = "constant") #the 'agr' attribute-geometry-relationship, specifies for each non-geometry attribute column how it relates to the geometry. "constant" is used for attributes that are constant throughout the geometry (e.g. land use)
Now we want to select burglaries inside Manchester City. We can use st_intersects() to select those that intersect with the Manchester city LSOA map.
#intersets
bur_m <- st_intersects(lsoa_WGS84, burglary_spatial)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
#subsetting
bur_m <- burglary_spatial[unlist(bur_m),]
Count the number of crimes in each LSOA by using the point in polygon spatial operation.
(remember this might take a while, it’s checking every single crime point, which you can see is over 400,000 points!)
burglaries_per_lsoa <- bur_m %>%
st_join(lsoa_WGS84, ., left = FALSE) %>%
count(code)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
You can see the column with the count of burglaries is called ‘n’. (You can View() your data or use the names() function to check what your variables are called). This is not great variable naming, so let’s rename n into something more meaningful:
burglaries_per_lsoa <- rename(burglaries_per_lsoa, burglary = n)
Now that’s all our data wrangling done. Woo!
Let’s remove all the dataframes and objects we’ve read in but no longer need, now that we have arrived at our final dataset, burglaries_per_lsoa.
#Remove redundant objects
rm(burglary_spatial)
rm(manchester_lsoa)
#Remove redundant non spatial burglary objects
rm(burglary)
rm(crimes)
And finally, let’s see our map! Plot with tmap:
tm_shape(burglaries_per_lsoa) +
tm_fill("burglary", style = "quantile", palette = "Reds") +
tm_borders(alpha = 0.1) +
tm_layout(main.title = "Burglary counts", main.title.size = 0.7 ,
legend.position = c("right", "bottom"), legend.title.size = 0.8)
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
Now that we have the data we need to coerce our newly created sf object into sp and then generate the wieght matrix. Again, what we do here is stuff we cover last week. In fact, if you did the optional homework you will also have run this code.
#Coerce sf into sp
burglary_m <- as(burglaries_per_lsoa, "Spatial")
#Generate list of neighbours using the Queen criteria
w <- poly2nb(burglary_m, row.names=burglary_m$lsoa_code)
#Generate list with weights using row standardisation
ww <- nb2listw(w, style='W')
5.22 Generating and visualising the LISA measures
The first step before we can generate the LISA map is to compute the local Moran’s I. The initial part of the video presentation by Luc Anselin that we expected you to watch explains the formula and logic underpinning these computations and we won’t reiterate here that detail. But at least a a general reminder:
“global tests for spatial autocorrelation introduced last week are calculated from the local relationship between the values observed at a spatial entity and its neighbours, for the neighbourhood definition chosen. Because of this, we can break global measures down into their components, and by extension, construct localised tests intended to detect ‘clusters’ -observations with very similar neighbours- and” spatial outliers “observations with very different neighbours” (Bivand et al. 201)
Let’s first look at the Moran’s scatterplot:
moran.plot(burglary_m$burglary, ww)

Notice how the plot is split in 4 quadrants. The top right corner belongs to areas that have high level of burglary and are surrounded by other areas that have above the average level of burglary. This are the high-high locations that Luc Anselin referred to. The bottom left corner belongs to the low-low areas. These are areas with low level of burglary and surrounded by areas with below average levels of burglary. Both the high-high and low-low represent clusters. A high-high cluster is what you may refer to as a hot spot. And the low-low clusters represent cold spots. In the opposite diagonal we have spatial outliers. They are not outliers in the standard sense, extreme observations, they are outliers in that they are surrounded by areas that are very unlike them. So you could have high-low spatial outliers, areas with high levels of burglary and low levels of surrounding burglary, or low-high spatial outliers, areas that have themselves low levels of burglary (or whatever else we are mapping) and that are surrounded by areas with above average levels of burglary.
You can also see here that the positive spatial autocorrelation is more noticeable when we focus on the whole of Manchester city, unlike what we observed when only looked at Manchester city. You can check this running the global Moran’s I.
moran(burglary_m$burglary, ww, n=length(ww$neighbours), S0=Szero(ww))
## $I
## [1] 0.319477
##
## $K
## [1] 33.59689
moran.mc(burglary_m$burglary, ww, nsim=99999)
##
## Monte-Carlo simulation of Moran I
##
## data: burglary_m$burglary
## weights: ww
## number of simulations + 1: 1e+05
##
## statistic = 0.31948, observed rank = 1e+05, p-value = 1e-05
## alternative hypothesis: greater
You can see that the global Moran’s is 0.32 and that is highly significant. There is indeed global spatial autocorrelation. Knowing this we can try to decompose this, figure out what is driving this global measure.
To compute the local Moran we can use a function from the spdep package.
locm_bm <- localmoran(burglary_m$burglary, ww)
summary(locm_bm)
## Ii E.Ii Var.Ii
## Min. :-1.933870 Min. :-0.003559 Min. :0.07787
## 1st Qu.:-0.004654 1st Qu.:-0.003559 1st Qu.:0.14505
## Median : 0.123084 Median :-0.003559 Median :0.17460
## Mean : 0.319477 Mean :-0.003559 Mean :0.17805
## 3rd Qu.: 0.314142 3rd Qu.:-0.003559 3rd Qu.:0.21894
## Max. :13.201230 Max. :-0.003559 Max. :0.44062
## Z.Ii Pr(z > 0)
## Min. :-4.12540 Min. :0.0000
## 1st Qu.:-0.00256 1st Qu.:0.2202
## Median : 0.29992 Median :0.3821
## Mean : 0.84062 Mean :0.3707
## 3rd Qu.: 0.77152 3rd Qu.:0.5010
## Max. :37.50929 Max. :1.0000
The first column provides the summary statistic for the local moran statistic. Being local you will have one for each of the areas. The last column gives you a p value for this statistic. In order to produce the LISA map we need to do some previous work. First we are going to create some new variables that we are going to need:
First we scale the variable of interest. When we scale burglary what we are doing is re-scaling the values so that the mean is zero. See an explanation of what this does here.
We use scale(), which is a generic function whose default method centers and/or scales the variable. Here centering is done by subtracting the mean (omitting NAs) the corresponding columns, and scaling is done by dividing the (centered) variable by their standard deviations:
#scale the variable of interest and save it to a new column
burglary_m$S_burglary <- scale(burglary_m$burglary) %>% as.vector()
We’ve also added as.vector() to the end, because
Now we also want to account for the spatial dependence of our values. Remember how we keep saying about “The First Law of Geography”, according to Waldo Tobler, is “everything is related to everything else, but near things are more related than distant things.”
So what do we mean by this spatial dependence? When a value observed in one location depends on the values observed at neighboring locations, there is a spatial dependence. And spatial data may show spatial dependence in the variables and error terms.
Why should spatial dependence occur? There are two reasons commonly given. First, data collection of observations associated with spatial units may reflect measurement error. This happens when the boundaries for which information is collected do not accurately reflect the nature of the underlying process generating the sample data. A second reason for spatial dependence is that the spatial dimension of a social or economic characteristic may be an important aspect of the phenomenon. For example, based on the premise that location and distance are important forces at work, regional science theory relies on notions of spatial interaction and diffusion effects, hierarchies of place and spatial spillovers.
There are two types of dependence, spatial error and spatial lag. Here we focus on spatial lag.
Spatial lag is when the dependent variable y in place i is affected by the independent variables in both place i and j.

This will be important to keep in mind when considering spatial regression. With spatial lag in ordinary least square regression, the assumption of uncorrelated error terms is violated, becasuse near things will have associated error terms. Similarly, the assumption of independent observations is also violated, as the observations are influenced by the other observations near them. As a result, the estimates are biased and inefficient. Spatial lag is suggestive of a possible diffusion process – events in one place predict an increased likelihood of similar events in neighboring places.
So to be able to account for the spatial lag in our model, we must create a variable to account for this. We can do this with the lag.listw() function. Remember from last week: a spatial lag in this context is simply the average value of the burglary count in the areas that are considered neighours of each LSOA. So we are plotting the value of burglary against the average value of burglary in the neighbours.
For this we need our listw object, which is the ww object created earlier, when we generated the list with weights using row standardisation. We then pass this listw object into the lag.listw() function, which computes the spatial lag of a numeric vector using a listw sparse representation of a spatial weights matrix.
#create a spatial lag variable and save it to a new column
burglary_m$lag_s_burglary <- lag.listw(ww, burglary_m$S_burglary)
Make sure to check the summaries to ensure nothing weird is going on
summary(burglary_m$S_burglary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.2149 -0.6048 -0.1908 0.0000 0.2667 9.0256
summary(burglary_m$lag_s_burglary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.99700 -0.44086 -0.07464 0.06068 0.34738 3.38679
We can create a Moran scatter plot so that you see that nothing has changed apart from the scale in wich we are using the variables. The observations that are influential are higlighted in the plot as you can see.
x <- burglary_m$S_burglary
y <- burglary_m$lag_s_burglary
xx <- data_frame(x,y)
moran.plot(x, ww)

We are now going to create a new variable to identify the quadrant in which each observation falls within the Moran Scatter plot, so that we can tell apart the high-high, low-low, high-low, and low-high areas. We will only identify those that are significant according to the p value that was provided by the local moran function.
Before we get started, let’s quickly review the tools we will use.
All our data is in this burglary_m dataframe. This has a variable for the LSOA code (code), a variable for the number of burglaries (burglary), and then also the two variables we created, the scaled measure of burglary (S_burglary), and the spatial lag measure (lag_s_burglary).
We also have our locm_bm object, which we created with the localmoran() function, that has calculated a variety of measures for each of our observations, which we explored with the summary() function. You can see (if you scroll up) that the 5th element in this object is the p-value (“Pr(z > 0)”). To call the nth element of an object, you can use the square brackets after its name. So to return the nth column of thing, you can use thing[,n]. Again this should not be new to you, as we’ve been doing this sort of thing for a while.
So the data we need for each observation, in order to identify whether it belongs to the high-high, low-low, high-low, or low-high quadrants are the standardised burglary score, the spatial lag score, and the p-value.
Essentially all we’ll be doing, is assigning a variable values based on where in the plot it is. So for example, if it’s in the upper right, it is high-high, and has values larger than 0 for both the burglary and the spatial lag values. It it’s in the upper left, it’s low-hih, and has a value larger than 0 for the spatial lag value, but lower than 0 on the burglary value. And so on, and so on. Here’s an image to illustrate:

So let’s first initialise this variable. In this instance we are creating a new column in the burglary_m dataframe and calling it quad_sig.
We are using the mutate() function from the tidyverse package to create our new variable, just as we have in previous labs.
We also use nested ifelse() statements. Nested ifelse() just means that it’s an ifelse() inside and ifelse() statement. To help us with these sorts of situations is the ifelse() function. We saw this with the previous exercises, but I’ll describe it brielfy again. It allows us to conditionally assign some value to some variable. The structure of the function is so that you have to pass it a condition, then a value to assign if the condition is true, and then another value if the condition is false. You are basically using this function to say: “if this condition is true, do first thing, else, do the second thing”. It would look something like this:
dataframe$new_variable <- ifelse(dataframe$some_numeric_var < 100, "smaller than 100", "not smaller than 100")
When nesting these, all you do is put another condition to check in the “thing to do if false”, so it checks all conditions. So in the first instance we check if the value for burglary is greater than zero, and the value for the lag is greater than zero, and the p-value is smaller than our threshol of 0.05. If it is, then this should belong to the “high-high” group. If any one of these conditions is not met, then we move into the ‘thing to do if false’ section, where we now check agains another set of criteria, and so on and so on. If none of these are met, we assign it the non-significant value:
burglary_m <- st_as_sf(burglary_m) %>%
mutate(quad_sig = ifelse(burglary_m$S_burglary > 0 &
burglary_m$lag_s_burglary > 0 &
locm_bm[,5] <= 0.05,
"high-high",
ifelse(burglary_m$S_burglary <= 0 &
burglary_m$lag_s_burglary <= 0 &
locm_bm[,5] <= 0.05,
"low-low",
ifelse(burglary_m$S_burglary > 0 &
burglary_m$lag_s_burglary <= 0 &
locm_bm[,5] <= 0.05,
"high-low",
ifelse(burglary_m$S_burglary <= 0 &
burglary_m$lag_s_burglary > 0 &
locm_bm[,5] <= 0.05,
"low-high",
"non-significant")))))
(Note we had to wrap our data in a st_as_sf() function, to covert back to sf object).
Now we can have a look at what this returns us:
table(burglary_m$quad_sig)
##
## high-high low-low non-significant
## 22 1 259
This looks like a lot of non-significant results. We want to be sure this isn’t an artefact of our code but is true, we can check how many values are under 0.05:
nrow(locm_bm[locm_bm[,5] <= 0.05,])
## [1] 23
We can see that only 23 areas have p-values under 0.05 threshold. So this is in line with our results, and we can rest assured.
Well, this is exciting, but where are these regions?
Let’s put ’em on a map, just simply, using quick thematic map (qtm()):
qtm(burglary_m, fill="quad_sig", fill.title="LISA")
Very nice!

So how do we interpret these results? Well keep in mind:
- The LISA value for each location is determined from its individual contribution to the global Moran’s I calculation.
- Whether or not this value is statistically significant is assessed by comparing the actual value to the value calculated for the same location by randomly reassigning the data among all the areal units and recalculating the values each time (the Monte Carlo simulation approach discussed earlier).
So essentially this map now tells us that there was statistically significant moderate clustering in burglaries in Manchester. When reporting your results, report at least the Moran’s I test value and the p value. So, for this test, you should report Moran’s I = 0.32, p < .001. Including the LISA cluster map is also a great way of showing how the attribute is actually clustering.
#Regression analysis (a refresher)
5.23 Introduction
In this session we are going to cover regression analysis or, rather, we are beginning to talk about regression modelling. This form of analysis has been one the main technique of data analysis in the social sciences for many years and it belongs to a family of techniques called generalised linear models. Regression is a flexible model that allows you to “explain” or “predict” a given outcome (Y), variously called your outcome, response or dependent variable, as a function of a number of what is variously called inputs, features or independent, explanatory, or predictive variables (X1, X2, X3, etc.). Following Gelman and Hill (2007), I will try to stick for the most part to the terms outputs and inputs.
Today we will cover something that is called linear regression or ordinary least squares regression (OLS), which is a technique that you use when you are interested in explaining variation in an interval level variable. First we will see how you can use regression analysis when you only have one input and then we will move to situations when we have several explanatory variables or inputs. For those of you already familiar with regression analysis this session can be a it of a refresher, for those that aren’t a bit of an introduction. Today we will cover regression models more generally and in the next lab we will discuss adaptations to the regression model that are necessary when you have spatial autocorrelation.
We will use a dataset that includes information about homicide in the US, as well as information in a number of sociodemographic variables that are often thought of as associated with the geographical distribution of homicides.
##R in Windows have some problems with https addresses, that's why we need to do this first:
urlfile<-'https://s3.amazonaws.com/geoda/data/ncovr.zip'
download.file(urlfile, 'ncovr.zip')
#Let's unzip and create a new directory (ncovr) in our working directory to place the files
unzip('ncovr.zip', exdir = 'ncovr')
There is quite a few files and folders here. We don’t need all of it. If you look inside the newly created folder you will find two subdirecoties MACOSX and ncovr. Look in the latter. You will see a pdf file called codebook.pdf. Read this file. It provides information on the variables and data that is included here. We are now going to read the main data file into R. This is the NAT.csv file.
ncovr <- read.csv('ncovr/ncovr/NAT.csv')
The dataset contains information about 3085 counties in the US and if you view it you will see it has information about several decades, the 60s, 70s, 80s, and 90s. The number at the end of the variable names denotes the relevant decade and you will see that for each decade we have the same variables.
The purpose of this and next session is to help you choose a model to represent the relationship between homicide and various predictors. You can think of a model as a map. A map aims to represent a given reality, but as you may have already discovered there are many ways of presenting the same information through a map. As an analyst you decide what the most appropriate representation for your needs is. Each representation you choose will involve an element of distortion. Maps (and models) are not exact representations of the real word, they are simply good approximations that may serve well a particular functional need. Look at the map of the metro in London that we reproduce below:
Figure 1
London doesn’t quite look like this. Yet, if you want to use the Metro system, this map will be extremely helpful to you. It serves a need in a good way. Same happens with models. They may not be terribly good reflections of the world, but may give us approximations that allows us to develop useful insights.
Choosing a good model is like choosing a good way for displaying quantitative information in a map. Decisions, decisions, decisions. There are many parameters and options one can choose from. This can be overwhelming, particularly as you are learning how to model and map phenomena. How to make good decisions is something that you learn on earnest by practice, practice, practice. Nobody expects you to get the maps you are doing as you are learning, and the models you are developing as you are learning spot on. So please do not stress out about this. All we can do here is to learn some basic principles and start getting some practice, which you will be able to further develop in aprofessional context or in further training.
The first step in any analysis is to develop some familiarity with the data you are going to be working with. We have been here before. Read the codebook. Run summary statistics for your quantitative variables, frequency distributions for your categorical variables, and visualise your variables. This will help you to detect any anomalies and give you a sense for what you have. If, for example, you run a histogram for the homicide rate for 1990 (HR90), you will get a sense of the distribution form –which of course is skewed.
library(ggplot2)
qplot(x = HR90, data = ncovr)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Once one has gone through the process of exploring the data in this way for all the variables you want to work, you can start exploring bivariate associations with your dependent, response or outcome variable. So, as an illustration, you could explore the association with resource deprivation (RD90), a measure of the level of concentrated disadvantge or social exclusion in an area, via a scatterplot:
ggplot(ncovr, aes(x = RD90, y = HR90)) +
geom_point(alpha=.2)

What do you think when looking at this scatterplot? Is there a relationship between the variables? Does it look as if individuals that have a high score on the X axis also have a high score on the Y axis? Or viceversa?
5.24 Motivating regression
Now, imagine that we play a game. Imagine I have all the respondents waiting in a room, and I randomly call one of them to the stage. You’re sitting in the audience, and you have to guess the level of fear (’tcviolent’) for that respondent. Imagine that I pay £150 to the student that gets the closest to the right value. What would you guess if you only have one guess and you knew (as we do) how fear of violent crime is distributed?
ggplot(ncovr, aes(x = HR90)) +
geom_density() +
geom_vline(xintercept = 4.377, linetype = "dashed", size = 1, color="red") +
geom_vline(xintercept = 6.183, linetype = "dashed", size = 1, color="blue") +
ggtitle("Density estimate, mean and median of homicide rate 1990")

summary(ncovr$HR90)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.334 4.377 6.183 8.938 71.378
If I only had one shot, you could go for the median, in red, (given the skew) but the mean, in blue, perhaps would be your second best. Most of the areas here have values clustered around those values, which is another way of saying they are bound to be not too far from them.
Imagine, however, that now when someone is called to the stage, you are told the level of resource deprivation in the county - so the value of the RD90 variable for the individual that has been selected (for example 4). Imagine as well that you have the scatterplot that we produced earlier in front of you. Would you still go for the value of “4.377” as your best guess for the value of the selected county?
I certainly would not go with the overall mean or median as my prediction anymore. If somebody said to me, the value ‘RD90’ for the selected respondent is 4, I would be more inclined to guess the mean value for the level of homicide with that level of resource deprivation (the conditional mean), rather than the overall mean across all the counties. Wouldn’t you?
If we plot the conditional means we can see that the mean of homicide rate for counties that report a value of 4 in RD90 is around 22. So you may be better off guessing that.
library(grid)
ggplot() +
geom_point(data=ncovr, aes(x = RD90, y = HR90), alpha=.2) +
geom_line(data=ncovr, aes(x = round(RD90/0.12)*0.12, y = HR90),
stat='summary',
fun.y=mean,
color="red",
size=1) +
annotate("segment", x=3, xend = 4, y = 25, yend= 22, color = "blue", size = 2, arrow = arrow()) +
annotate("text", x = 3, y = 29, label = "Pick this one!", size =7, colour = "blue")

Linear regression tackles this problem using a slightly different approach. Rather than focusing on the conditional mean (smoothed or not), it draws a straight line that tries to capture the trend in the data. If we focus in the region of the scatterplot that are less sparse we see that this is an upward trend, suggesting that as resource deprivation increases so does the homicide rate.
Simple linear regression draws a single straight line of predicted values as the model for the data. This line would be a model, a simplification of the real world like any other model (e.g., a toy pistol, an architectural drawing, a subway map), that assumes that there is approximately a linear relationship between X and Y. Let’s draw the regression line:
ggplot(data = ncovr, aes(x = RD90, y = HR90)) +
geom_point(alpha = .2) +
geom_smooth(method = "lm", se = FALSE, color = "red", size = 1) #This ask for a geom with the regression line, method=lm asks for the linear regression line, se=FALSE ask for just the line to be printed, the other arguments specify the color and thickness of the line

What that line is doing is giving you guesses (predictions) for the values of homicide based in the information that we have about the level of resource deprivation. It gives you one possible guess for the value of homicide for every possible value of resource deprivation and links them all together in a straight line.
Another way of thinking about this line is as the best possible summary of the cloud of points that are represented in the scatterplot (if we can assume that a straight line would do a good job doing this). If I were to tell you to draw a straight line that best represents this pattern of points the regression line would be the one that best does it (if certain assumptions are met).
The linear model then is a model that takes the form of the equation of a straight line through the data. The line does not go through all the points. In fact, you can see is a slightly less accurate representation than the (smoothed) conditional means:

Our regression line underpredicts at low levels of resource deprivation and does not seem to capture well the variability at higher levels of resource deprivation. But imperfect as a model as it might be it simplifies well the overall growing trend for homicide as resource deprivation increases.
As De Veaux et al (2012: 179) highlight: “like all models of the real world, the line will be wrong, wrong in the sense that it can’t match reality exactly. But it can help us understand how the variables are associated”. A map is never a perfect representation of the world, the same happens with statistical models. Yet, as with maps, models can be helpful.
5.25 Fitting a simple regression model
In order to draw a regression line we need to know two things: (1) We need to know where the line begins, what is the value of Y (our dependent variable) when X (our independent variable) is 0, so that we have a point from which to start drawing the line. The technical name for this point is the intercept. (2) And we need to know what is the slope of that line, that is, how inclined the line is, the angle of the line.
If you recall from elementary algebra (and you may not), the equation for any straight line is: y = mx + b In statistics we use a slightly different notation, although the equation remains the same: y = b0 + b1x
We need the origin of the line (b0) and the slope of the line (b1). How does R get the intercept and the slope for the green line? How does R know where to draw this line? We need to estimate these parameters (or coefficients) from the data. How? We don’t have the time to get into these more mathematical details now. You should study the required reading to understand this (required means it is required, it is not optional)1. For now, suffice to say that for linear regression modes like the one we cover here, when drawing the line, R tries to minimise the distance from every point in the scatterplot to the regression line using a method called least squares estimation.
In order to fit the model we use the lm() function using the formula specification (Y ~ X). Typically you want to store your regression model in a “variable”, let’s call it fit_1:
fit_1 <- lm(HR90 ~ RD90, data = ncovr)
You will see in your R Studio global environment space that there is a new object called fit_1 with 12 elements on it. We can get a sense for what this object is and includes using the functions we introduced in previous weeks:
class(fit_1)
## [1] "lm"
attributes(fit_1)
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"
R is telling us that this is an object of class lm and that it includes a number of attributes. One of the beauties of R is that you are producing all the results from running the model, putting them in an object, and then giving you the opportunity for using them later on. If you want to simply see the basic results from running the model you can use the summary() function.
summary(fit_1)
##
## Call:
## lm(formula = HR90 ~ RD90, data = ncovr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.796 -3.415 -0.719 2.540 67.103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.18286 0.09844 62.81 <2e-16 ***
## RD90 3.77121 0.09846 38.30 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.468 on 3083 degrees of freedom
## Multiple R-squared: 0.3224, Adjusted R-squared: 0.3222
## F-statistic: 1467 on 1 and 3083 DF, p-value: < 2.2e-16
Or if you prefer more parsimonious presentation you could use the display() function of the arm package:
library(arm)
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: lme4
##
## arm (Version 1.10-1, built: 2018-4-12)
## Working directory is /Users/reka/Dropbox (The University of Manchester)/crimemapping_textbook_bookdown
display(fit_1)
## lm(formula = HR90 ~ RD90, data = ncovr)
## coef.est coef.se
## (Intercept) 6.18 0.10
## RD90 3.77 0.10
## ---
## n = 3085, k = 2
## residual sd = 5.47, R-Squared = 0.32
detach(package:arm) #This will unload the package
For now I just want you to focus on the numbers in the “Estimate” (or coef.est) column. The value of 6.18 estimated for the intercept is the “predicted” value for Y when X equals zero. This is the predicted value of the fear of crime score when resource deprivation has a value of zero.
summary(ncovr$RD90)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.4103 -0.6667 -0.2016 0.0000 0.4393 5.5831
RD90 is a variable that has been centered in ). It has been created by the researchers in such a way that is has a mean value of 0. Since we only have one explanatory variable in the model this corresponds to the mean of the homicide rate, 6.18. In many other contexts the intercept has less of a meaning.
We then need the b1 regression coefficient for for our independent variable, the value that will shape the slope in this scenario. This value is 3.77. This estimated regression coefficient for our independent variable has a convenient interpretation. When the value is positive, it tells us that for every one unit increase in X there is a b1 increase on Y. If the coefficient is negative then it represents a decrease on Y. Here, we can read it as “for every one unit increase in the resource deprivation score, there is a 3.77 unit increase in the homicide rate.”
Knowing these two parameters not only allows us to draw the line, we can also solve for any given value of X. Let’s go back to our guess-the-homicide-rate game. Imagine I tell you the level of resource deprivation is 1. What would be your best bet now? We can simply go back to our regression line equation and insert the estimated parameters:
y = b0 + b1x
y = 6.18 + 3.77 (1)
y = 9.95
Or if you don’t want to do the calculation yourself, you can use the predict function (differences are due to rounding error):
predict(fit_1, data.frame(RD90 = c(1))) #First you name your stored model and then you identify the new data (which has to be in a data frame format and with a variable name matching the one in the original data set)
## 1
## 9.954065
This is the expected value of Y, homicide rate, when X, resource deprivation is 1 according to our model (according to our simplification of the real world, our simplification of the whole cloud of points into just one straight line). Look back at the scatterplot we produced earlier with the red line. Does it look as if the green line when X is 1 corresponds to a value of Y of 9.95?
5.26 Residuals revisited: R squared
In the output above we saw there was something called the residuals. The residuals are the differences between the observed values of Y for each case minus the predicted or expected value of Y, in other words the distances between each point in the dataset and the regression line (see the visual example below).
drawing
You see that we have our line, which is our predicted values, and then we have the black dots which are our actually observed values. The distance between them is essentially the amount by which we were wrong, and all these distances between observed and predicted values are our residuals. Least square estimation essentially aims to reduce the squared average of all these distances: that’s how it draws the line.
Why do we have residuals? Well, think about it. The fact that the line is not a perfect representation of the cloud of points makes sense, doesn’t it? You cannot predict perfectly what the value of Y is for every observation just by looking ONLY at their level of resource deprivation! This line only uses information regarding resource deprivation. This means that there’s bound to be some difference between our predicted level of homicide given our knowledge of deprivation (the regression line) and the actual level of homicide (the actual location of the points in the scatterplot). There are other things that matter not being taken into account by our model to predict the values of Y. There are other things that surely matter in terms of understanding homicide. And then, of course, we have measurement error and other forms of noise.
We can re-write our equation like this if we want to represent each value of Y (rather than the predicted value of Y) then: y = b0 + b1x + residuals
The residuals capture how much variation is unexplained, how much we still have to learn if we want to understand variation in Y. A good model tries to maximise explained variation and reduce the magnitude of the residuals.
We can use information from the residuals to produce a measure of effect size, of how good our model is in predicting variation in our dependent variables. Remember our game where we try to guess homicide (Y)? If we did not have any information about X our best bet for Y would be the mean of Y. The regression line aims to improve that prediction. By knowing the values of X we can build a regression line that aims to get us closer to the actual values of Y (look at the Figure below).
r squared
The distance between the mean (our best guess without any other piece of information) and the observed value of Y is what we call the total variation. The residual is the difference between our predicted value of Y and the observed value of Y. This is what we cannot explain (i.e, variation in Y that is unexplained). The difference between the mean value of Y and the expected value of Y (the value given by our regression line) is how much better we are doing with our prediction by using information about X (i.e., in our previous example it would be variation in Y that can be explained by knowing about resource deprivation). How much closer the regression line gets us to the observed values. We can then contrast these two different sources of variation (explained and unexplained) to produce a single measure of how good our model is. The formula is as follows:
formula
All this formula is doing is taking a ratio of the explained variation (the squared differences between the regression line and the mean of Y for each observation) by the total variation (the squared differences of the observed values of Y for each observation from the mean of Y). This gives us a measure of the percentage of variation in Y that is “explained” by X. If this sounds familiar is because it is a measure similar to eta squared in ANOVA.
As then we can take this value as a measure of the strength of our model. If you look at the R output you will see that the R2 for our model was .32 (look at the multiple R square value in the output) . We can say that our model explains 32% of the variance in the fear of homicide. when doing regression, you will often find that regression models with aggregate data such as county level data will give you better results than when dealing with individuals. It is much harder understanding individual variation than county level variation.
#As an aside, and to continue emphasising your appreciation of the object oriented nature of R, when we run the summary() function we are simply generating a list object of the class summary.lm.
attributes(summary(fit_1))
## $names
## [1] "call" "terms" "residuals" "coefficients"
## [5] "aliased" "sigma" "df" "r.squared"
## [9] "adj.r.squared" "fstatistic" "cov.unscaled"
##
## $class
## [1] "summary.lm"
#This means that we can access its elements if so we wish. So, for example, to obtain just the R Squared, we could ask for:
summary(fit_1)$r.squared
## [1] 0.3224335
Knowing how to interpret this is important. R^2 ranges from 0 to 1. The greater it is the more powerful our model is, the more explaining we are doing, the better we are able to account for variation in our outcome Y with our input. In other words, the stronger the relationship is between Y and X. As with all the other measures of effect size interpretation is a matter of judgement. You are advised to see what other researchers report in relation to the particular outcome that you may be exploring.
Weisburd and Britt (2009: 437) suggest that in criminal justice you rarely see values for R^2 greater than .40. Thus, if your R^2 is larger than .40, you can assume you have a powerful model. When, on the other hand, R^2 is lower than .15 or .2 the model is likely to be viewed as relatively weak. Our observed r squared here is rather poor. There is considerably room for improvement if we want to develop a better model to explain fear of violent crime2. In any case, many people would argue that R^2 is a bit overrated. You need to be aware of what it measures and the context in which you are using it. Read here for some additional detail.
5.27 Inference with regression
In real applications, we have access to a set of observations from which we can compute the least squares line, but the population regression line is unobserved. So our regression line is one of many that could be estimated. A different sample would produce a different regression line. The same sort of ideas that we introduced when discussing the estimation of sample means or proportions also apply here. if we estimate b0 and b1 from a particular sample, then our estimates won’t be exactly equal to b0 and b1 in the population. But if we could average the estimates obtained over a very large number of data sets, the average of these estimates would equal the coefficients of the regression line in the population.
We can compute standard errors for the regression coefficients to quantify our uncertainty about these estimates. These standard errors can in turn be used to produce confidence intervals. This would require us to assume that the residuals are normally distributed. As seen in the image, and for a simple regression model, you are assuming that the values of Y are approximately normally distributed for each level of X:
normalityresiduals
In those circumstances we can trust the confidence intervals that we can draw around the regression line as in the image below:
estimated
The dark-blue line marks the best fit. The two dark-pink lines mark the limits of the confidence interval. The light-pink lines show the sampling distributions around each of the confidence-interval limits (the many regression lines that would result from repeated sampling); notice that the best-fit line falls at the extreme of each sampling distribution.
You can also then perform standard hypothesis test on the coefficients. As we saw before when summarising the model, R will compute the standard errors and a t test for each of the coefficients.
summary(fit_1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.182860 0.09844166 62.80735 0.000000e+00
## RD90 3.771206 0.09845761 38.30283 6.535551e-263
In our example, we can see that the coefficient for our predictor here is statistically significant3.
We can also obtain confidence intervals for the estimated coefficients using the confint() function:
confint(fit_1)
## 2.5 % 97.5 %
## (Intercept) 5.989842 6.375877
## RD90 3.578156 3.964255
5.28 Fitting regression with categorical predictors
So far we have explained regression using a numeric input. It turns out we can also use regression with categorical explanatory variables. It is quite straightforward to run it.
There is only one categorical explanatory variable in this dataset, a binary indicator that indicates whether the county is in a Southern State or not. We can also explore this relationship using regression and a regression line. This is how you would express the model:
#We use the as.factor function to tell R that SOUTH is a categorical variable
fit_2 <- lm(HR90 ~ as.factor(SOUTH), data=ncovr)
Notice that there is nothing different in how we ask for the model. And see below the regression line:
ggplot(data=ncovr, aes(x=as.factor(SOUTH), y=HR90)) +
geom_point(alpha=.2, position="jitter", color="orange") +
geom_abline(intercept = fit_2$coefficients[1],
slope = fit_2$coefficients[2], color="red", size=1) +
theme_bw()

Let’s have a look at the results:
summary(fit_2)
##
## Call:
## lm(formula = HR90 ~ as.factor(SOUTH), data = ncovr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.549 -3.342 -1.172 1.931 68.036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.3416 0.1437 23.25 <2e-16 ***
## as.factor(SOUTH)1 6.2077 0.2124 29.22 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.878 on 3083 degrees of freedom
## Multiple R-squared: 0.2169, Adjusted R-squared: 0.2167
## F-statistic: 854 on 1 and 3083 DF, p-value: < 2.2e-16
As you will see the output does not look too different. But notice that in the print out you see how the row with the coefficient and other values for our input variable SOUTH we see that R is printing 1. What does this mean?
It turns out that a linear regression model with just one dichotomous categorical predictor is just the equivalent of a t test. When you only have one predictor the value of the intercept is the mean value of the reference category and the coefficient for the slope tells you how much higher (if it is positive) or how much lower (if it is negative) is the mean value for the other category in your factor.
The reference category is the one for which R does not print the level next to the name of the variable for which it gives you the regression coefficient. Here we see that the named level is “1”. That’s telling you that the reference category here is “0”. If you look at the codebook you will see that 1 means the county is in a Southern state. Therefore the Y intercept in this case is the mean value of fear of violent crime for the northern counties, whereas the coefficient for the slope is telling you how much higher (since it is a positive value) the mean value is for the southern counties. Don’t believe me?
mean(ncovr$HR90[ncovr$SOUTH == 0], na.rm=TRUE)
## [1] 3.341614
mean(ncovr$HR90[ncovr$SOUTH == 1], na.rm=TRUE) - mean(ncovr$HR90[ncovr$SOUTH == 0], na.rm=TRUE)
## [1] 6.207679
So, to reiterate, for a binary predictor, the coefficient is nothing else than the difference between the mean of the two levels in your factor variable, between the averages in your two groups.
With categorical variables encoded as factors you always have a situation like this: a reference category and then as many additional coefficients as there are additional levels in your categorical variable. Each of these additional categories is included into the model as “dummy” variables. Here our categorical variable has two levels, thus we have only one dummy variable. There will always be one fewer dummy variable than the number of levels. The level with no dummy variable, northern counties in this example, is known as the reference category or the baseline.
It turns out then that the regression table is printing out for us a t test of statistical significance for every input in the model. If we look at the table above this t value is 29.22 and the p value associated with it is near 0. This is indeed considerably lower than the conventional significance level of 0.05. So we could conclude that the probability of obtaining this value if the null hypothesis is true is very low. The r squared is not too bad either, although lower than we saw when using resource deprivation.
5.29 Motivating multiple regression
So we have seen that we can fit models with just one predictor. We can build better models by expanding the number of predictors (although keep in mind you should also aim to build models as parsimonious as possible).
Another reason why it is important to think about additional variables in your model is to control for spurious correlations (although here you may also want to use your common sense when selecting your variables!). You must have heard before that correlation does not equal causation. Just because two things are associated we cannot assume that one is the cause for the other. Typically we see how the pilots switch the secure the belt button when there is turbulence. These two things are associated, they tend to come together. But the pilots are not causing the turbulences by pressing a switch! The world is full of spurious correlations, associations between two variables that should not be taking too seriously. You can explore a few here. It’s funny.
Looking only at covariation between pair of variables can be misleading. It may lead you to conclude that a relationship is more important than it really is. This is no trivial matter, but one of the most important ones we confront in research and policy4.
It’s not an exaggeration to say that most quantitative explanatory research is about trying to control for the presence of confounders, variables that may explain explain away observed associations. Think about any criminology question: Does marriage reduces crime? Or is it that people that get married are different from those that don’t (and are those pre-existing differences that are associated with less crime)? Do gangs lead to more crime? Or is it that young people that join gangs are more likely to be offenders to start with? Are the police being racist when they stop and search more members of ethnic minorities? Or is it that there are other factors (i.e., offending, area of residence, time spent in the street) that, once controlled, would mean there is no ethnic dis-proportionality in stop and searches? Does a particular program reduces crime? Or is the observed change due to something else?
These things also matter for policy. Wilson and Kelling, for example, argued that signs of incivility (or antisocial behaviour) in a community lead to more serious forms of crime later on as people withdraw to the safety of their homes when they see those signs of incivilities and this leads to a reduction in informal mechanisms of social control. All the policies to tackle antisocial behaviour in this country are very much informed by this model and were heavily influenced by broken windows theory.
But is the model right? Sampson and Raudenbush argue it is not entirely correct. They argue, and tried to show, that there are other confounding (poverty, collective efficacy) factors that explain the association of signs of incivility and more serious crime. In other words, the reason why you see antisocial behaviour in the same communities that you see crime is because other structural factors explain both of those outcomes. They also argue that perceptions of antisocial behaviour are not just produced by observed antisocial behaviour but also by stereotypes about social class and race. If you believe them, then the policy implications are that only tackling antisocial behaviour won’t help you to reduce crime (as Wilson and Kelling have argued) . So as you can see this stuff matters for policy not just for theory.
Multiple regression is one way of checking the relevance of competing explanations. You could set up a model where you try to predict crime levels with an indicator of broken windows and an indicator of structural disadvantage. If after controlling for structural disadvantage you see that the regression coefficient for broken windows is still significant you may be into something, particularly if the estimated effect is still large. If, on the other hand, the t test for the regression coefficient of your broken windows variable is no longer significant, then you may be tempted to think that perhaps Sampson and Raudenbush were into something.
5.30 Fitting and interpreting a multiple regression model
It could not be any easier to fit a multiple regression model. You simply modify the formula in the lm() function by adding terms for the additional inputs.
ncovr$SOUTH_f <- as.factor(ncovr$SOUTH)
fit_3 <- lm(HR90 ~ RD90 + SOUTH_f, data=ncovr)
summary(fit_3)
##
## Call:
## lm(formula = HR90 ~ RD90 + SOUTH_f, data = ncovr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.480 -2.996 -0.576 2.216 68.151
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.7270 0.1394 33.90 <2e-16 ***
## RD90 2.9649 0.1108 26.77 <2e-16 ***
## SOUTH_f1 3.1809 0.2223 14.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.295 on 3082 degrees of freedom
## Multiple R-squared: 0.3647, Adjusted R-squared: 0.3642
## F-statistic: 884.4 on 2 and 3082 DF, p-value: < 2.2e-16
With more than one input, you need to ask yourself whether all of the regression coefficients are zero. This hypothesis is tested with a F test. Again we are assuming the residuals are normally distributed, though with large samples the F statistics approximates the F distribution.
You see the F test printed at the bottom of the summary output and the associated p value, which in this case is way below the conventional .05 that we use to declare statistical significance and reject the null hypothesis. At least one of our inputs must be related to our response variable.
Notice that the table printed also reports a t test for each of the predictors. These are testing whether each of these predictors is associated with the response variable when adjusting for the other variables in the model. They report the “partial effect of adding that variable to the model” (James et al. 2014: 77). In this case we can see that both variables seem to be significantly associated with the response variable.
If we look at the r squared we can now see that it is higher than before. r squared will always increase as a consequence of adding new variables, even if the new variables added are weakly related to the response variable.
We see that the coefficients for the predictors change somehow, it goes down a bit for RD90 and it halvesfor SOUTH. But their interpretation now changes. A common interpretation is that now the regression for each variable tells you about changes in Y related to that variable when the other variables in the model are held constant. So, for example, you could say the coefficient for RD90 represents the increase in hommicide for every one-unit increase in the measure of resource deprivation when holding all other variables in the model constant (in this case that refers to holding constant SOUTH). But this terminology can be a bit misleading.
Other interpretations are also possible and are more generalizable. Gelman and Hill (2007: p. 34) emphasise what they call the predictive interpretation that considers how “the outcome variable differs, on average, when comparing two groups of units that differ by 1 in the relevant predictor while being identical in all the other predictors”. So if you’re regressing y on u and v, the coefficient of u is the average difference in y per difference in u, comparing pairs of items that differ in u but are identical in v.
So, for example, in this case we could say that comparing counties that have the same level of resource deprivation but that differed in whether they are South or North, the model predicts a expected difference of 3.18 in their homicide rate. And that respondents that do not vary in whether they are South or North, but that differ by one point in the level of resource deprivation, we would expect to see a difference of 2.96 in their homicide rate. So we are interpreting the regression slopes as comparisons of cases that differ in one predictor while being at the same levels of the other predictors.
As you can see, interpreting regression coefficients can be kind of tricky5. The relationship between the response y and any one explanatory variable can change greatly depending on what other explanatory variables are present in the model.
For example, if you contrast this model with the one we run with only SOUTH as a predictor you will notice the intercept has changed. You cannot longer read the intercept as the mean value of homicide rate for Northern counties. Adding predictors to the model changes their meaning. Now the intercept index the value of homicide for southern counties that score 0 in RD90. In this case you have cases that meet this condition (equal zero in all your predictors), but often you may not have any case that does meet the definition of the intercept. More often than not, then, there is not much value in bothering to interpret the intercept.
Something you need to be particularly careful about is to interpret the coefficients in a causal manner. At least your data come from an experiment this is unlikely to be helpful. With observational data regression coefficients should not be read as indexing causal relations. This sort of textbook warning is, however, often neglectfully ignored by professional researchers. Often authors carefully draw sharp distinctions between causal and correlational claims when discussing their data analysis, but then interpret the correlational patterns in a totally causal way in their conclusion section. This is what is called the causation or causal creep. Beware. Don’t do this tempting as it may be.
Comparing the simple models with this more complex model we could say that adjusting for SOUTH does not change much the impact of RD90 in homicide, but that adjusting for resource deprivation halves the impact of the regional effect on homicide.
5.31 Presenting your regression results.
Communicating your results in a clear manner is incredibly important. We have seen the tabular results produced by R. If you want to use them in a paper you may need to do some tidying up of those results. There are a number of packages (textreg, stargazer) that automatise that process. They take your lm objects and produce tables that you can put straight away in your reports or papers. One popular trend in presenting results is the coefficient plot as an alternative to the table of regression coefficients. There are various ways of producing coefficient plots with R for a variety of models. See here and here, for example.
We are going to use instead the plot_model() function of the sjPlot package, that makes it easier to produce this sort of plots. You can find a more detailed tutorial about this function here. See below for an example:
library(sjPlot)
## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.2.15
## Current Matrix version is 1.2.14
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
## #refugeeswelcome
Let’s try with a more complex example:
fit_4 <- lm(HR90 ~ RD90 + SOUTH_f + DV90 + MA90 + PS90, data=ncovr)
plot_model(fit_4, breakLabelsAt = 30)

Visual display of the effects of the variables in the model are particularly helpful. The effects package allows us to produce plots to visualise these relationships (when adjusting for the other variables in the model). Here’s an example going back to our model fit_3 which contained SOUTH and RD90 predictor variables:
library(effects)
## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(fit_3), ask=FALSE)

Notice that the line has a confidence interval drawn around it and that the predicted means for southern and northern counties (when controlling for RD90) also have a confidence interval.
5.32 Rescaling input variables to assist interpretation
The interpretation or regression coefficients is sensitive to the scale of measurement of the predictors. This means one cannot compare the magnitude of the coefficients to compare the relevance of variables to predict the response variable. Let’s look at the more recent model, how can we tell what predictors have a stronger effect?
summary(fit_4)
##
## Call:
## lm(formula = HR90 ~ RD90 + SOUTH_f + DV90 + MA90 + PS90, data = ncovr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.740 -2.588 -0.678 1.708 69.180
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.20350 0.98475 4.269 2.03e-05 ***
## RD90 3.19923 0.11167 28.648 < 2e-16 ***
## SOUTH_f1 2.59975 0.21557 12.060 < 2e-16 ***
## DV90 0.47594 0.05308 8.967 < 2e-16 ***
## MA90 -0.07609 0.02743 -2.774 0.00557 **
## PS90 1.26451 0.10047 12.587 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.035 on 3079 degrees of freedom
## Multiple R-squared: 0.4262, Adjusted R-squared: 0.4252
## F-statistic: 457.3 on 5 and 3079 DF, p-value: < 2.2e-16
We just cannot. One way of dealing with this is by rescaling the input variables. A common method involves subtracting the mean and dividing by the standard deviation of each numerical input. The coefficients in these models is the expected difference in the response variable, comparing units that differ by one standard deviation in the predictor while adjusting for other predictors in the model.
Instead, Gelman (2008) has proposed dividing each numeric variables by two times its standard deviation, so that the generic comparison is with inputs equal to plus/minus one standard deviation. As Gelman explains the resulting coefficients are then comparable to untransformed binary predictors. The implementation of this approach in the arm package subtract the mean of each binary input while it subtract the mean and divide by two standard deviations for every numeric input.
The way we would obtain these rescaled inputs uses the standardize() function of the arm package, that takes as an argument the name of the stored fit model.
library(arm)
##
## arm (Version 1.10-1, built: 2018-4-12)
## Working directory is /Users/reka/Dropbox (The University of Manchester)/crimemapping_textbook_bookdown
standardize(fit_4)
##
## Call:
## lm(formula = HR90 ~ z.RD90 + c.SOUTH_f + z.DV90 + z.MA90 + z.PS90,
## data = ncovr)
##
## Coefficients:
## (Intercept) z.RD90 c.SOUTH_f z.DV90 z.MA90
## 6.1829 6.3985 2.5998 1.6497 -0.5478
## z.PS90
## 2.5290
Notice the main change affects the numerical predictors. The unstandardised coefficients are influenced by the degree of variability in your predictors, which means that typically they will be larger for your binary inputs. With unstandardised coefficients you are comparing complete change in one variable (whether one is a Southern county or not) with one-unit changes in your numerical variable, which may not amount to much change. So, by putting in a comparable scale, you avoid this problem.
Standardising in the way described here will help you to make fairer comparisons. This standardised coefficients are comparable in a way that the unstandardised coefficients are not. We can now see what inputs have a comparatively stronger effect. It is very important to realise, though, that one should not compare standardised coefficients across different models.
5.33 Testing conditional hypothesis: interactions
In the social sciences there is a great interest in what are called conditional hypothesis or interactions. Many of our theories do not assume simply additive effects but multiplicative effects.For example, Wikstrom and his colleagues (2011) suggest that the threat of punishment only affects the probability of involvement on crime for those that have a propensity to offend but are largely irrelevant for people who do not have this propensity. Or you may think that a particular crime prevention programme may work in some environments but not in others. The interest in this kind of conditional hypothesis is growing.
One of the assumptions of the regression model is that the relationship between the response variable and your predictors is additive. That is, if you have two predictors x1 and x2. Regression assumes that the effect of x1 on y is the same at all levels of x2. If that is not the case, you are then violating one of the assumptions of regression. This is in fact one of the most important assumptions of regression, even if researchers often overlook it.
One way of extending our model to accommodate for interaction effects is to add additional terms to our model, a third predictor x3, where x3 is simply the product of multiplying x1 by x2. Notice we keep a term for each of the main effects (the original predictors) as well as a new term for the interaction effect. “Analysts should include all constitutive terms when specifying multiplicative interaction models except in very rare circumstances” (Brambor et al., 2006: 66).
How do we do this in R? One way is to use the following notation in the formula argument. Notice how we have added a third term RD90:SOUTH_f, which is asking R to test the conditional hypothesis that resource deprivation may have a different impact on homicide for soutern and northern counties.
fit_5 <- lm(HR90 ~ RD90 + SOUTH_f + RD90:SOUTH_f , data=ncovr)
# which is equivalent to:
# fit_5 <- lm(HR90 ~ RD90 * SOUTH_f , data=ncovr)
summary(fit_5)
##
## Call:
## lm(formula = HR90 ~ RD90 + SOUTH_f + RD90:SOUTH_f, data = ncovr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.055 -2.998 -0.566 2.227 68.136
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5477 0.1586 28.675 <2e-16 ***
## RD90 2.5814 0.1963 13.148 <2e-16 ***
## SOUTH_f1 3.2612 0.2247 14.515 <2e-16 ***
## RD90:SOUTH_f1 0.5622 0.2377 2.365 0.0181 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.292 on 3081 degrees of freedom
## Multiple R-squared: 0.3658, Adjusted R-squared: 0.3652
## F-statistic: 592.4 on 3 and 3081 DF, p-value: < 2.2e-16
You see here that essentially you have only two inputs (resource deprivation and south) but several regression coefficients. Gelman and Hill (2007) suggest reserving the term input for the variables encoding the information and to use the term predictor to refer to each of the terms in the model. So here we have two inputs and three predictors (one for SOUTH, another for resource deprivation, and a final one for the interaction effect).
In this case the test for the interaction effect is significant, which suggests there is such an interaction. Let’s visualise the results with the effects package:
plot(allEffects(fit_5), ask=FALSE)

Notice that essentially what we are doing is running two regression lines and testing whether the slope is different for the two groups. The intercept is different, we know that Southern counties are more violent, but what we are testing here is whether the level of homicide goes up in a steeper fashion (and in the same direction) for one or the other group as the level of resource deprivation goes up. We see that’s the case here. The estimated lines are almost parallel, but the slopw is a bit more steep in the Southern counties. In southern counties resource deprivation seems to have more of an impact on homicide than in northern counties.
A word of warning, the moment you introduce an interaction effect the meaning of the coefficients for the other predictors changes (what it is often referred as to the “main effects” as opposed to the interaction effect). You cannot retain the interpretation we introduced earlier. Now, for example, the coefficient for the SOUTH variable relates the marginal effect of this variable when RD90 equals zero. The typical table of results helps you to understand whether the effects are significant but offers little of interest that will help you to meaningfully interpret what the effects are. For this is better you use some of the graphical displays we have covered.
Essentially what happens is that the regression coefficients that get printed are interpretable only for certain groups. So now:
The intercept still represents the predicted score of homicide for southern counties and have a score of 0 in resurce deprivation (as before).
The coefficient of
SOUTH_f1now can be thought of as the difference between the predicted score of homicide rate for northern counties that have a score of 0 in resource deprivation and northern counties that have a score of 0 in resource deprivation.The coefficient of
RD90now becomes the comparison of mean homicide rate for southern counties who differ by one point in resource deprivation.The coefficient for the interaction term represents the difference in the slope for
RD90comparing southern and northern counties, the difference in the slope of the two lines that we visualised above.
Models with interaction terms are too often misinterpreted. I strongly recommend you read this piece by Brambor et al (2005) to understand some of the issues involved. When discussing logistic regression we will return to this and will consider tricks to ease the interpretation.
Equally, John Fox (2003) piece on the effects package goes to much more detail that we can here to explain the logic and some of the options that are available when producing plots to show interactions with this package.
5.34 Model building and variable selection
How do you construct a good model? This partly depends on your goal, although there are commonalities. You do want to start with theory as a way to select your predictors and when specifying the nature of the relationship to your response variable (e.g., additive, multiplicative). Gelman and Hill (2007) provide a series of general principles6. I would like to emphasise at this stage two of them:
Include all input variables that, for substantive reasons, might be expected to be important in predicting the outcome.
For inputs with large effects, consider including their interactions as well.
It is often the case that for any model, the response variable is only related to a subset of the predictors. There are some scenarios where you may be interested in understanding what is the best subset of predictors. Imagine that you want to develop a risk assessment tool to be used by police officers that respond to a domestic violence incident, so that you could use this tool for forecasting the future risk of violence. There is a cost to adding too many predictors. A police officer time should not be wasted gathering information on predictors that are not associated with future risk. So you may want to identify the predictors that will help in this process.
Ideally, we would like to perform variable selection by trying out a lot of different models, each containing a different subset of the predictors. There are various statistics that help in making comparisons across models. Unfortunately, as the number of potentially relevant predictors increases the number of potential models to compare increases exponentially. So you need methods that help you in this process. There are a number of tools that you can use for variable selection but this goes beyond the aims of this introduction. If you are interested you may want to read this.
5.35 Regression assumptions
Although so far we have discussed the practicalities of fitting and interpreting regression models, in practical applications you want to first check your model and proceed from there. Not much point spending time interpreting your model until you know that the model reasonably fits your data.
In previous data analysis modules we surely covered assumptions made by various statistical tests. The regression model also makes assumptions of its own. In fact, there are so many that we could spend an entire class discussing them. Gelman and Hill (2007) point out that the most important regression assumptions by decreasing order of importance are:
- Validity. The data should be appropriate for the question that you are trying to answer:
“Optimally, this means that the outcome measure should accurately reflect the phenomenon of interes, the model should include all relevant predictors, and the model should generalize to all cases to which it will be applied… Data used in empirical research rarely meet all (if any) of these criteria precisely. However, keeping these goals in mind can help you be precise about the types of questions you can and cannot answer reliably”
Additiviy and linearity. These are the most important mathematical assumptions of the model. We already talked about additivity in previous sessions and discussed how you can include interaction effects in your models if the additivity assumption is violated. We will discuss problems with non-linearities today as well as ways to diagnose and solve this problem. If the relationship is non linear (e.g, it is curvilinear) predicted values will be wrong in a biased manner, meaning that predicted values will systematically miss the true pattern of the mean of y (as related to the x-variables).
Independence of errors. Regression assumes that the errors from the prediciton line (or hyperplane) are independent. If there is dependency between the observations (you are assessing change across the same units, working with spatial units, or with units that are somehowed grouped such as students from the same class), you may have to use models that are more appropriate (e.g., multilevel models, spatial regression, etc.).
Equal variances of errors. When the variance of the residuals is unequal, you may need different estimation methods. This is, nonetheless, considered a minor issue. There is a small effect on the validity of t-test and F-test results, but generally regression inferences are robust with regard to the variance issue.
Normality of errors. The residuals should be normally distributed. Gelman and Hill (2007: 46) discuss this as the least important of the assumptions and in fact “do not recommend diagnostics of the normality of the regression residuals”. If the errors do not have a normal distribution, it usually is not particularly serious. Regression inferences tend to be robust with respect to normality (or nonnormality of the errors). In practice, the residuals may appear to be nonnormal when the wrong regression equation has been used. So, I will show you how to inspect normality of the residuals not because this is a problem on itself, but because it may be give you further evidence that there is some other problem with the model you are applying to your data.
Apart from this, it is convenient to diagnose multicollinearity (this affects interpretation) and influential observations.
So these are the assumptions of linear regression, and today we will go through how to test for them, and also what are some options that you can consider if you find that your model violates them. While finding that some of the assumptions are violated do not necessarily mean that you have to scrap your model, it is important to use these diagnostics to illustrate that you have considered what the possible issues with your model is, and if you find any serious issues that you address them.
You may have noticed the second of this assumption is independence of errors. This is an issue with spatial data. If you have spatial autocorrelation basically you are syaing that your observations are not independent. What happens in area X is likely to be similar to what happens in its surrounding neighbors (if you have positive spatial autocorrelation). What do you do? Well, that’s what we will cover next week. We will learn how to fit regression models where you have spatial dependency.
5.35.1 HOMEWORK
Fit a regression model with a few relevant explanatory variables for the homicide rate in 1970. Make sure you interpret your results.
#Spatial regression models
5.36 Introduction
Last week we provided you with an introduction to regression analysis with R. the data we used had a spatial component. We were modelling the geographical distribution of homicide across US counties. However, we did not incorporated this spatial component into our models. As we have explained througout the semester criminal events often cluster geographically in space. So if we want to develop a regression model for crime we may have to recognise this spatial component. Remember as well, from last week, that regression models assume independence between the observations. That is, a regression model is formally assuming that what happens in area Xi is not in any way related (it is independent) of what happens in area Xii. But if those two areas are adjacent in geographical space we know that there is a good chance that this assumption may be violated. In previous weeks we covered formal tests for spatial autocorrelation, which allow us to test whether this assumption is met or not. So before we fit a regression model with spatial data we need to explore the issue of autocorrelation. We already know how to do this. In this session, we will examine the data from last week, explore whether autocorrelation is an issue, and then introduce models that allow us to take into account spatial autocorrelation. We will see that there are two basic ways of adjusting for spatial autocorrelation: through a spatial lag model or through a spatial error model.
Before we do any of this, we need to load the libraries we will use today:
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
library(tmap)
library(sp)
library(spdep)
## Loading required package: Matrix
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source'))`
Then we will bring back the data from last week:
##R in Windows have some problems with https addresses, that's why we need to do this first:
urlfile<-'https://s3.amazonaws.com/geoda/data/ncovr.zip'
download.file(urlfile, 'ncovr.zip')
#Let's unzip and create a new directory (ncovr) in our working directory to place the files
unzip('ncovr.zip', exdir = 'ncovr')
Last week we did not treated the data as spatial and, consequently, relied on the csv file. But notice that in the unzip ncovr file there is also a shapefile that we can load as a spatial object into R:
shp_name <- "ncovr/ncovr/NAT.shp"
ncovr_sf <- st_read(shp_name)
## Reading layer `NAT' from data source `/Users/reka/Dropbox (The University of Manchester)/crimemapping_textbook_bookdown/ncovr/ncovr/NAT.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3085 features and 69 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -124.7314 ymin: 24.95597 xmax: -66.96985 ymax: 49.37173
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
We can indeed represent our variable of interest using a choropleth map.
current_style <- tmap_style("col_blind")
## tmap style set to "col_blind"
## other available styles are: "white", "gray", "natural", "cobalt", "albatross", "beaver", "bw", "classic", "watercolor"
tm_shape(ncovr_sf) +
tm_fill("HR90", title = "Homicide Rate (Quantiles)", style="quantile", palette = "Reds") +
tm_borders(alpha = 0.1) +
tm_layout(main.title = "Homicide Rate across US Counties, 1990", main.title.size = 0.7 ,
legend.position = c("right", "bottom"), legend.title.size = 0.8)

Do you think there is spatial patterning to homicide?
5.37 Looking at the residuals and testing for spatial autocorrelation in regression
Residuals, as we have explained, give you an idea of the distance between our observed Y values and the predicted Y values. So in essence they are deviations of reality from your model. Your regression line or hyperplane is optimised to be the one that best represent your data if those assumptions are met. Residuals are very helpful to diagnose, then, whether your model is a good representation of reality or not. Most diagnostics of the assumptions for OLS regression, which is if you remember from last year the technique we are using, rely on exploring the residuals.
In order to explore the residuals we need to fit our model first. Let’s look at one of the models from last week.
fit_1 <- lm(HR90 ~ RD90 + SOUTH + DV90 + MA90 + PS90 +UE90, data=ncovr_sf)
Now that we have fitted the model we can extract the residuals. If you look at the fit_1 object in your RStudio environment or if you run the str() function to look inside this object you will see that this object is a list with differet elements, one of which is the residuals. An element of this object then includes the residual for each of your observations (the difference between the observed value and the value predicted by your model). We can extract the residuals using the residuals() function and add them to our spatial data set.
ncovr_sf$res_fit1 <- residuals(fit_1)
If you now look at the dataset you will see that there is a new variable with the residuals. In those cases where the residual is negative this is telling us that the observed value is lower than the predicted (that is, our model is overpredicting the level of homicide for that observation) when the residual is positive the observed value is higher than the predicted (that is, our model is underpredicting the level of homicide for that observation).
We could also extract the predicted values if we wanted. We would use the fitted() function.
ncovr_sf$fitted_fit1 <- fitted(fit_1)
Now look at the second county in the dataset. It has a homice rate in 1990 of 15.88. This is the observed value. If we look at the new column we have created (fitted_fit1), our model predicts a homicide rate of 2.41. That is, knowing the level unemployment, whether the county is North or South, the level of resource deprivation, etc., we are predicting a homicide rate of 2.41. Now, this is lower than the observed value, so our model is underpredicting the level of homicide in this case. If you observed the residual you will see that it has a value of 13.46, which is simply the difference between the observed and the predicted value.
With spatial data one useful thing to do is to look at any spatial patterning in the distribution of the residuals. Notice that the residuals are the difference between the observed values for homicide and the predicted values for homicide, so you want your residual to NOT display any spatial patterning. If, on the other hand, your model display a patterning in the areas of the study region where it performs predicts badly, then you may have a problem. This is telling your model is not a good representation of the social phenomena you are studying across the full study area: there is systematically more distortion in some areas than in others.
We are going to produce a choropleth map for the residuals, but we will use a common classification method we haven’t covered yet: standard deviations. Standard deviation is a statistical technique type of map based on how much the data differs from the mean. You measure the mean and standard deviation for your data. Then, each standard deviation becomes a class in your choropleth maps.
In order to do that we will compute the mean and the standard deviation for the variable we want to plot and break the variable according to these values. The following code creates a new variable in which we will express the residuals in terms of standard deviations away from the mean. So, for each observation, we substract the mean and divide by the standard deviation.
ncovr_sf$sd_breaks <- (ncovr_sf$res_fit1 - mean(ncovr_sf$res_fit1))/sd(ncovr_sf$res_fit1)
summary(ncovr_sf$sd_breaks)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.5370 -0.5238 -0.1404 0.0000 0.3314 13.7407
Next we use a new style, fixed, within the tm_fill function. When we break the variable into classes using the fixed argument we need to specify the boundaries of the classes. We do this using the breaks argument. In this case we are going to ask R to create 7 classes based on standard deviations away from the mean.
my_breaks <- c(-14,-3,-2,-1,1,2,3,14)
tm_shape(ncovr_sf) +
tm_fill("sd_breaks", title = "Residuals", style = "fixed", breaks = my_breaks, palette = "-RdBu") +
tm_borders(alpha = 0.1) +
tm_layout(main.title = "Residuals", main.title.size = 0.7 ,
legend.position = c("right", "bottom"), legend.title.size = 0.8)
## Variable "sd_breaks" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Notice the spatial patterning of areas of over-prediction (negative residuals, or blue tones) and under-prediction (positive residuals, or brown tones). This visual inspection of the residuals is telling you that spatial autocorrelation may be present here. This, however, would require a more formal test.
Remember from week 6 that in order to do this first we need to turn our sf object into a sp class object and then create the spatial weight matrix. If the code below and what it does is not clear to you, revise the notes from week 6 -when we first introduced it.
#We coerce the sf object into a new sp object
ncovr_sp <- as(ncovr_sf, "Spatial")
#Then we create a list of neighbours using the Queen criteria
w <- poly2nb(ncovr_sp, row.names=ncovr_sp$FIPSNO)
summary(w)
## Neighbour list object:
## Number of regions: 3085
## Number of nonzero links: 18168
## Percentage nonzero weights: 0.190896
## Average number of links: 5.889141
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 11 13 14
## 24 36 91 281 620 1037 704 227 50 11 2 1 1
## 24 least connected regions:
## 53009 53029 25001 44005 36103 51840 51660 6041 51790 51820 51540 51560 6075 51580 51530 51131 51115 51770 51720 51690 51590 27031 26083 55029 with 1 link
## 1 most connected region:
## 49037 with 14 links
This should give you an idea of the distribution of connectedness across the data, with counties having on average nearly 6 neighbours. Now we can generate the row standardise spatial weight matrix and the Moran Scatterplot.
wm <- nb2mat(w, style='B')
rwm <- mat2listw(wm, style='W')
We obtain the Moran’s test for regression residuals using the function lm.morantest() as below. It is important to realize that the Moran’s I test statistic for residual spatial autocorrelation takes into account the fact that the variable under consideration is a residual, computed from a regression. The usual Moran’s I test statistic does not. It is therefore incorrect to simply apply a Moran’s I test to the residuals from the regression without correcting for the fact that these are residuals.
lm.morantest(fit_1, rwm, alternative="two.sided")
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = HR90 ~ RD90 + SOUTH + DV90 + MA90 + PS90 +
## UE90, data = ncovr_sf)
## weights: rwm
##
## Moran I statistic standard deviate = 10.321, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I Expectation Variance
## 0.1093062514 -0.0014498532 0.0001151682
You will notice we obtain a statistically significant value for Moran’s I. The value of the Moran’s I test is not too high, but we still need to keep it in mind. If we diagnose that spatial autocorrelation is an issue, that is, that the errors (the residuals) are related systematically among themselves, then we have a problem and need to use a more appropriate approach: a spatial regression model.
5.38 What to do now?
If the test is significant (as in this case), then we possibly need to think of a more suitable model to represent our data: a spatial regression model. Remember spatial dependence means that there will be areas of spatial clustering for the residuals in our regression model. So our predicted line (or hyperplane) will systematically under-predict or over-predict in areas that are close to each other. That’s not good. We want a better model that does not display any spatial clustering in the residuals.
There are two general ways of incorporating spatial dependence in a regression model, through what we called a spatial error model or by means of a spatially lagged model. There are spdep functions that provides us with some tools to help us make a decision as to which of these two is most appropriate: the Lagrange Multiplier tests.
The difference between these two models is both technical and conceptual. The spatial error model treats the spatial autocorrelation as a nuisance that needs to be dealt with. A spatial error model basically implies that the:
spatial dependence observed in our data does not reflect a truly spatial process, but merely the geographical clustering of the sources of the behaviour of interest. For example, citizens in adjoining neighbohoods may favour the same (political) candidate not because they talk to their neighbors, but because citizens with similar incomes tend to cluster geographically, and income also predicts vote choice. Such spatial dependence can be termed attributional dependence" (Darmofal, 2015: 4)
The spatially lagged model, on the other hand, incorporates spatial dependence explicitly by adding a “spatially lagged” variable y on the right hand side of our regression equation. Its distinctive characteristic is that it includes a spatially lagged “dependent” variable among the explanatory factors. It’s basically explicitly saying that the values of y in the neighbouring areas of observation n~i is an important predictor of y on each individual area n~i . This is one way of saying that the spatial dependence may be produced by a spatial process such as the diffusion of behaviour between neighboring units:
“If so the behaviour is likely to be highly social in nature, and understanding the interactions between interdependent units is critical to understanding the behaviour in question. For example, citizens may discuss politics across adjoining neighbours such that an increase in support for a candidate in one neighbourhood directly leads to an increase in support for the candidate in adjoining neighbourhoods” (Darmofal, 2015: 4)
5.39 Spatial Regimes
Before we proceed to a more detailed description of these two models it is important that we examine another aspect of our model that also links to geography. Remember that when we brought up our data into R, we decided to test for the presence of an interaction. We looked at whether the role of unemployment was different in Southern and Northern states. We found that this interaction was indeed significant. Unemployment had a more significant effect in Southern than in Northern states. This was particularly obvious during the 1970s, when unemployment did not affected homicide rates in the Northern states, but it did led to a decrease in homicide in the Southern states.
We could have attempted to test other interaction effects between some of our other predictors and their geographical location in the South or the North. But we did not.
If you have read the Ballen et al. (2001) paper that we are replicating in last week and this week lab, you will have noticed that they decided that they needed to run separate models for the South and the North. This kind of situation, where sub-regions, seem to display different patterns often is alluded with the name of spatial regimes. In the context of regression analysis, spatial regimes relates to the possibility that we may need to split our data into two (or more sub-regions) in order to run our models, because we presume that the relationship of the predictors to the outcome may play out differently in these sub-regions (spatial regimes).
So how can we assess whether this is an issue in our data? As with many other diagnostics of regression, you may want to start by looking at your residuals. Look at the residual map we produced earlier. Do you think that the residuals look different in the South and in the North? If the pattern is not clear to you, you may want to run other forms of visualisation.
library(ggplot2)
ggplot(ncovr_sf, aes(x = res_fit1, colour = as.factor(SOUTH))) +
geom_density()

5.39.1 HOMEWORK 1
What do you see in this plot? And, critically, what does it mean? What is this telling you about the predicted values that result from our model? (Remember what a residual is: the difference between the observed values and the predicted values).
There are formal tests that one can use to further explore these issues. The paper by Bollen et al. (2001) mentions them (Chow tests). But those are beyond the scope of this course. Sufficient to say that, as Bollen et al. (2001), we are going to split our analysis and run them separately for the Southern and the Northern states. We have covered the filter() function from dplyr to split datasets based on values of a variable. But to split sf objects it is better to rely on the more generic subset function, since filter() doesn’t accommodate well the column with the geographic information that sf provides.
ncovr_s_sf <- subset(ncovr_sf, SOUTH == 1)
ncovr_n_sf <- subset(ncovr_sf, SOUTH == 0)
5.40 Lagrange multipliers
The Moran’s I test statistic has high power against a range of spatial alternatives. However, it does not provide much help in terms of which alternative model would be most appropriate. The Lagrange Multiplier test statistics do allow a distinction between spatial error models and spatial lag models.
In order to practice their computation and interpretation, let’s run two separate OLS regression models (one for the South and one for the North), using the same predictors as we used last week and, first, focusing on homicide in the northern counties in the earliest year for which we have data (1960). We have split the data in two, so that means that before we do this we need to create new files for the spatial weight matrix: in particular we will create one using first order queen criteria.
#We coerce the sf object into a new sp object
ncovr_n_sp <- as(ncovr_n_sf, "Spatial")
#Then we create a list of neighbours using the Queen criteria
w_n <- poly2nb(ncovr_n_sp, row.names=ncovr_n_sp$FIPSNO)
wm_n <- nb2mat(w_n, style='B')
rwm_n <- mat2listw(wm_n, style='W')
fit_2 <- lm(HR60 ~ RD60 + DV60 + MA60 + PS60 +UE60, data=ncovr_n_sf)
First look at the Moran’s I.
lm.morantest(fit_2, rwm_n, alternative="two.sided")
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data
## = ncovr_n_sf)
## weights: rwm_n
##
## Moran I statistic standard deviate = 2.84, p-value = 0.004511
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I Expectation Variance
## 0.0394654814 -0.0022539192 0.0002157889
The p (probability) value associated with this Moran’s I is below our standard threshold. So we will say that we have an issue with spatial autocorrelation that we need to deal with. OLS regression won’t do. In order to decide whether to fit a spatial error or a spatially lagged model we need to run the Lagrange Multipliers.
Both Lagrange multiplier tests (for the error and the lagged models, LMerr and LMlag respectively), as well as their robust forms (RLMerr and RLMLag, also respectively) are included in the lm.LMtests function. Again, a regression object and a spatial listw object must be passed as arguments. In addition, the tests must be specified as a character vector (the default is only LMerror), using the c( ) operator (concatenate), as illustrated below.
lm.LMtests(fit_2, rwm_n, test = c("LMerr","LMlag","RLMerr","RLMlag","SARMA"))
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data
## = ncovr_n_sf)
## weights: rwm_n
##
## LMerr = 7.1326, df = 1, p-value = 0.007569
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data
## = ncovr_n_sf)
## weights: rwm_n
##
## LMlag = 16.595, df = 1, p-value = 4.627e-05
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data
## = ncovr_n_sf)
## weights: rwm_n
##
## RLMerr = 8.2906, df = 1, p-value = 0.003985
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data
## = ncovr_n_sf)
## weights: rwm_n
##
## RLMlag = 17.753, df = 1, p-value = 2.515e-05
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data
## = ncovr_n_sf)
## weights: rwm_n
##
## SARMA = 24.886, df = 2, p-value = 3.946e-06
How do we interpret the Lagrange Multipliers? First we look at the standard ones (LMerr and LMlag). If both are below the .05 level this means we need to have a look at the robust version of these tests (Robust LM). If the non-robust version is not significant, the mathematical properties of the robust tests may not hold, so we don’t look at them in those scenarios. It is fairly common to find that both the lag (LMlag) and the error (LMerr) non-robust LM are significant. If only one of them were, problem solved we would choose a spatial lag or a spatial error model according to this (i.e., if the lag M was significant and the error LM was not we would run a spatial lag model or viceversa). Here we see that both are significant, thus, we need to inspect their robust versions.
Yet we look at the robust Lagrange multipliers (RLMlag and RLMerr) and encounter that both are significant again. What do we do? Luc Anselin (2008: 199-200) proposes the following criteria:
“When both LM test statistics reject the null hypothesis, proceed to the bottom part of the graph and consider the Robust forms of the test statistics. Typically, only one of them will be significant, or one will be orders of magnitude more significant than the other (e.g., p < 0.00000 compared to p < 0.03). In that case the decision is simple: estimate the spatial regression model matching the (most) significant” robust “statistic. In the rare instance that both would be highly significant, go with the model with the largest value for the test statistic. However, in this situation, some caution is needed, since there may be other sources of misspecification. One obvious action to take is to consider the results for different spatial weight and/or change the basic (i.e., not the spatial part) specification of the model. there are also rare instances where neither of the Robust LM test statistics are significant. In those cases, more serious specification problems are likely present and those should be addressed first”.
By other specification errors Prof. Anselin refers to problems with some of the other assumptions of regression that we covered last week.
Ok, so let’s go back to our results and keep this criteria in mind. Both Robust LM are significant, but we can see that the test for the lag model is several orders more significant than for the error model (p < 0.0000251 vs p < 0.004). Notice as well that the test value is higher for the lag model (18 versus 8). In this case we would say that the Lagrange Multiplier tests suggest we estimate a spatial lag model rather than a spatial error model.
5.40.1 HOMEWORK 2
???Run a OLS regression model for homicide rate in 1980 for the Southern states using the covariates we have been using so far and make sure that you ask for the spatial dependence tests and the Lagrange Multipliers (attach the results). What model do you think need to be estimated in this case? The spatial lag or the spatial error model?
5.41 Fitting and interpreting a spatially lagged model
Maximum Likelihood (ML) estimation of the spatial lag model is carried out with the lagsarlm() function. The required arguments are a regression “formula”, a data set and a listw spatial weights object. The default method uses Ord’s eigenvalue decomposition of the spatial weights matrix.
fit_2_lag <- lagsarlm(HR60 ~ RD60 + DV60 + MA60 + PS60 +UE60, data=ncovr_n_sf, rwm_n)
summary(fit_2_lag)
##
## Call:
## lagsarlm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data = ncovr_n_sf,
## listw = rwm_n)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.86076 -1.54308 -0.55531 0.76832 28.30435
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.802071 0.677236 8.5673 < 2.2e-16
## RD60 1.734746 0.157771 10.9954 < 2.2e-16
## DV60 1.002757 0.077951 12.8640 < 2.2e-16
## MA60 -0.179328 0.020024 -8.9557 < 2.2e-16
## PS60 0.382956 0.077193 4.9610 7.014e-07
## UE60 0.087963 0.030109 2.9215 0.003483
##
## Rho: 0.13749, LR test value: 15.139, p-value: 9.9874e-05
## Asymptotic standard error: 0.035631
## z-value: 3.8587, p-value: 0.000114
## Wald statistic: 14.89, p-value: 0.000114
##
## Log likelihood: -4273.43 for lag model
## ML residual variance (sigma squared): 9.6542, (sigma: 3.1071)
## Number of observations: 1673
## Number of parameters estimated: 8
## AIC: 8562.9, (AIC for lm: 8576)
## LM test for residual autocorrelation
## test value: 10.659, p-value: 0.0010954
As expected, the spatial autoregressive parameter (Rho) is highly significant, as indicated by the p-value of 0.000114 on an asymptotic t-test (based on the asymptotic variance matrix). The Likelihood Ratio test (LR) on this parameter is also highly significant (p value 9.9874e-05).
Just to reiterate, we run the spatial lag model not only because the Lagrange Multiplier suggests this may be appropriate, but also because in this case we believe that the values of y in one county, i, are directly influenced by the values of y that exist in the “neighbors” of i. This is an influence that goes beyond other explanatory variables that are specific to i. Remember what we said earlier in the spatial lag model we are simply adding as an additional explanatory variable the values of y in the surrounding area. What we mean by “surrounding” will be defined by our spatial weight matrix. It’s important to emphasise that one has to think very carefully and explore appropriate definitions of “surrounding” (as we discussed, though just superficially, in the section on spatial clustering a few weeks ago). We are using here the first order queen criteria, but in real practice you would need to explore whether this is the best definition and one that makes theoretical sense.
How do you interpret these results? First you need to look at the general measures of fit of the model. I know what you are thinking. Look at the R Square and compare them, right? Well, don’t. This R Square is not a real R Square, but a pseudo-R Square and therefore is not comparable to the one we obtain from the OLS regression model. Instead we can look at the Akaike Information Criterion. We see that the lag model has a AIC of 8562.9 whereas the linear model with no lags has a AIC of 8576, so this is telling us there is a better fit when we include the spatial lag.
In our spatial lag model you will notice that there is a new term Rho. What is this? This is our spatial lag. It is a variable that measures the homicide rate in the counties that are defined as surrounding each county in our spatial weight matrix. We are simply using this variable as an additional explanatory variable to our model, so that we can appropriately take into account the spatial clustering detected by our Moran’s I test. You will notice that the estimated coefficient for this termis both positive and statistically significant. In other words, when the homicide rate in surrounding areas increases, so does the homicide rate in each country, even when we adjust for the other explanatory variables in our model. The fact the lag is significant adds further evidence that this is a better model than the OLS regression specification.
You also see at the bottom further tests for spatial dependence, a likelihood ratio test. This is not a test for residual spatial autocorrelation after we introduce our spatial lag. What you want is for this test to be significant because in essence is further evidence that the spatial lag model is a good fit.
How about the coefficients? It may be tempting to look at the regression coefficients for the other explanatory variables for the original OLS model and compare them to those in the spatial lag model. But you should be careful when doing this. Their meaning now has changed:
“Interpreting the substantive effects of each predictor in a spatial lag model is much more complex than in a nonspatial model (or in a spatial error model) because of the presence of the spatial multiplier that links the independent variables to the dependent. In the nonspatial model, it does not matter which unit is experiencing the change on the independent variable. The effect” in the dependent variable “of a change” in the value of an independent variable “is constant across all observations” (Darmofal, 2015: 107). Remember we say, when interpreting a regression coefficient for variable X~i, that they indicate how much Y goes up or down for every one unit increase in X~i when holding all other variables in the model constant. In our example, for the nonspatial model this effect is the same for every county in our dataset. But in the spatial lag model things are not the same. We cannot interpret the regression coefficients for the substantive predictors in the same way because the “substantive effects of the independent variables vary by observation as a result of the different neighbors for each unit in the data” (Darmofal, 2015: 107).
In the OLS regression model, the coefficients for any of the explanatory variables measure the absolute impact of these variables. It is a simpler scenario. We look at the effect of X in Y within each county. So X in county A affects Y in count A. In the spatial lag model there are two components to how X affect Y. X affects Y within each county directly but remember we are also including the spatial, the measure of Y in the surrounding counties (call them B, C, and D). So our model includes not only the effect of X in county A in the level of Y in county A. By virtues of including the spatial lag (a measure of Y in county B, C and D) we are indirectly incorporating as well the effects that X has on Y in counties B, C, and D. So the effect of a covariate (independent variable) is the sum of two particular effects: a direct, local effect of the covariate in that unit, and an indirect, spillover effect due to the spatial lag.
In, other words, in the spatial lag model, the coefficients only focus on the “short-run impact” of x~i on y~i , rather than the net effect. As Ward and Gleditsch (2008) explain “Since the value of y~i will influence the level of” homicide “in other” counties y~j and these y~j , in turn, feedback on to y~i , we need to take into account the additional effects that the short impact of x~i exerts on y~i through its impact on the level of" homicide “in other” counties. You can still read the coefficients in the same way but need to keep in mind that they are not measuring the net effect. Part of their effect will be captured by the spatial lag. Yet, you may still want to have a look at whether things change dramatically, particularly in terms of their significance (which is not the case in this example).
5.42 Fitting an interpreting a spatial error model
We saw (well you should have seen!) that for the case of homicide in the 80s for Southern states the spatial error model was more appropriate when using the 1 st order contiguity queen criteria. In this case then, we need to run a spatial error model.
Maximum likelihood estimation of the spatial error model is similar to the lag procedure and implemented in the errorsarlm() function. Again, the formula, data set and a listw spatial weights object must be specified, as illustrated below.
#We coerce the sf object into a new sp object
ncovr_s_sp <- as(ncovr_s_sf, "Spatial")
#Then we create a list of neighbours using the Queen criteria
w_s <- poly2nb(ncovr_s_sp, row.names=ncovr_s_sp$FIPSNO)
wm_s <- nb2mat(w_s, style='B')
rwm_s <- mat2listw(wm_s, style='W')
fit_3_err <- errorsarlm(HR80 ~ RD80 + DV80 + MA80 + PS80 +UE80, data=ncovr_s_sf, rwm_s)
summary(fit_3_err)
##
## Call:errorsarlm(formula = HR80 ~ RD80 + DV80 + MA80 + PS80 + UE80,
## data = ncovr_s_sf, listw = rwm_s)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.70984 -3.46135 -0.56817 2.55031 24.84717
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 11.395096 1.525679 7.4689 8.082e-14
## RD80 3.286154 0.189085 17.3793 < 2.2e-16
## DV80 0.732640 0.133462 5.4895 4.030e-08
## MA80 -0.177938 0.044032 -4.0411 5.319e-05
## PS80 1.540084 0.214637 7.1753 7.216e-13
## UE80 -0.174869 0.066194 -2.6418 0.008248
##
## Lambda: 0.3026, LR test value: 61.006, p-value: 5.6621e-15
## Asymptotic standard error: 0.037726
## z-value: 8.0209, p-value: 1.1102e-15
## Wald statistic: 64.335, p-value: 9.992e-16
##
## Log likelihood: -4384.823 for error model
## ML residual variance (sigma squared): 28.654, (sigma: 5.353)
## Number of observations: 1412
## Number of parameters estimated: 8
## AIC: 8785.6, (AIC for lm: 8844.7)
The spatial lag model is probably the most common specification and maybe the most generally useful way to think about spatial dependence. But we can also enter the spatial dependence as, we mentioned trough the error term in our regression equation. Whereas the spatial lag model sees the spatial dependence as substantively meaningful (in that y, say homicide, in county i is influenced by homicide in its neighbours), the spatial error model simply treats the spatial dependence as a nuisance. This model focuses on estimating the regression parameters for the explanatory variables of interest and disregards the possibility that the spatial clustering, the spatial autocorrelation, may reflect something meaningful (other than attributional dependence as explained earlier). So instead of assuming that a spatial lag influences the dependent variable we estimate a model that relaxes the standard regression model assumption about the need for the errors to be independent. It’s beyond the scope of this introductory course to cover the mathematical details of this procedure, though you can use the suggested reading (particularly the highly accessible Ward and Gleditsch,2008, book or the more recent Darmofal, 2015) or some of the video lectures in the matter that are available in the GeoDa website.
As before the AIC is better for the spatial model (8786) than for the non spatial model (8845). In this case, you can compare the regression coefficients with those from the OLS model, since we don’t have a spatial lag capturing some of their effect. Notice how one of the most notable differences is the fact that unemployment nearly halves its impact in the new model. You will see the table includes a new parameter (lambda) but you don’t need to worry about this for the purpose of the course. It is something you would understand if you get into the mathematical estimation details.
5.42.1 HOMEWORK 3
Estimate an appropriate regression model for the homicide rate for the 1970s for the Northern States. Justify and interpret the model that you have selected.
5.43 Interpreting Spatial Lag Coefficiences
Today we are mostly going to go back to some issues about geographical representation (mapping rates, bins) and other more general issues that are quite revelant when we talk about spatial analysis, and are integral part of a spatial analysis course (e.g., the modifiable area unit problem). But before we do any of that we need to go back to something we mentioned last week about the spatial lag models.
Remember that last week we fitted the following spatial lag model after creating a spatial weight matrix based on Queen first order neighbours:
#Split the data into Southern and Northern counties
ncovr_s_sf <- subset(ncovr_sf, SOUTH == 1)
ncovr_n_sf <- subset(ncovr_sf, SOUTH == 0)
#We coerce the sf object into a new sp object
ncovr_n_sp <- as(ncovr_n_sf, "Spatial")
#Then we create a list of neighbours using the Queen criteria
w_n <- poly2nb(ncovr_n_sp, row.names=ncovr_n_sp$FIPSNO)
wm_n <- nb2mat(w_n, style='B')
rwm_n <- mat2listw(wm_n, style='W')
#Fit model
fit_1_lag <- lagsarlm(HR60 ~ RD60 + DV60 + MA60 + PS60 +UE60, data=ncovr_n_sf, rwm_n)
summary(fit_1_lag)
##
## Call:
## lagsarlm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data = ncovr_n_sf,
## listw = rwm_n)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.86076 -1.54308 -0.55531 0.76832 28.30435
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.802071 0.677236 8.5673 < 2.2e-16
## RD60 1.734746 0.157771 10.9954 < 2.2e-16
## DV60 1.002757 0.077951 12.8640 < 2.2e-16
## MA60 -0.179328 0.020024 -8.9557 < 2.2e-16
## PS60 0.382956 0.077193 4.9610 7.014e-07
## UE60 0.087963 0.030109 2.9215 0.003483
##
## Rho: 0.13749, LR test value: 15.139, p-value: 9.9874e-05
## Asymptotic standard error: 0.035631
## z-value: 3.8587, p-value: 0.000114
## Wald statistic: 14.89, p-value: 0.000114
##
## Log likelihood: -4273.43 for lag model
## ML residual variance (sigma squared): 9.6542, (sigma: 3.1071)
## Number of observations: 1673
## Number of parameters estimated: 8
## AIC: 8562.9, (AIC for lm: 8576)
## LM test for residual autocorrelation
## test value: 10.659, p-value: 0.0010954
Remember what we said last week:
“Interpreting the substantive effects of each predictor in a spatial lag model is much more complex than in a nonspatial model (or in a spatial error model) because of the presence of the spatial multiplier that links the independent variables to the dependent. In the nonspatial model, it does not matter which unit is experiencing the change on the independent variable. The effect” in the dependent variable “of a change” in the value of an independent variable “is constant across all observations” (Darmofal, 2015: 107). Remember we say, when interpreting a regression coefficient for variable X~i, that they indicate how much Y goes up or down for every one unit increase in X~i when holding all other variables in the model constant. In our example, for the nonspatial model this effect is the same for every county in our dataset. But in the spatial lag model things are not the same. We cannot interpret the regression coefficients for the substantive predictors in the same way because the “substantive effects of the independent variables vary by observation as a result of the different neighbors for each unit in the data” (Darmofal, 2015: 107).
In the OLS regression model, the coefficients for any of the explanatory variables measure the absolute impact of these variables. It is a simpler scenario. We look at the effect of X in Y within each county. So X in county A affects Y in count A. In the spatial lag model there are two components to how X affect Y. X affects Y within each county directly but remember we are also including the spatial, the measure of Y in the surrounding counties (call them B, C, and D). So our model includes not only the effect of X in county A in the level of Y in county A. By virtues of including the spatial lag (a measure of Y in county B, C and D) we are indirectly incorporating as well the effects that X has on Y in counties B, C, and D. So the effect of a covariate (independent variable) is the sum of two particular effects: a direct, local effect of the covariate in that unit, and an indirect, spillover effect due to the spatial lag.
This implies that a change in the ith region’s predictor can affect the jth region’s outcome. We have 2 situations: (a) the direct impact of an observation’s predictor on its own outcome, and (b) the indirect impact of an observation’s neighbor’s predictor on its outcome.This leads to three quantities that we want to know:
-Average Direct Impact, which is similar to a traditional interpretation -Average Total impact, which would be the total of direct and indirect impacts of a predictor on one’s outcome -Average Indirect impact, which would be the average impact of one’s neighbors on one’s outcome
These quantities can be found using the impacts() function in the spdep library. We follow the example that converts the spatial weight matrix into a “sparse” matrix, and power it up using the trW() function. This follows the approximation methods described in Lesage and Pace, 2009. Here, we use Monte Carlo simulation to obtain simulated distributions of the various impacts.
W <- as(rwm_n, "CsparseMatrix")
trMC <- trW(W, type="MC")
im<-impacts(fit_1_lag, tr=trMC, R=100)
sums<-summary(im, zstats=T)
#To print the coefficients
data.frame(sums$res)
## direct indirect total
## 1 1.74081572 0.27045500 2.0112707
## 2 1.00626530 0.15633446 1.1625998
## 3 -0.17995505 -0.02795801 -0.2079131
## 4 0.38429552 0.05970457 0.4440001
## 5 0.08827116 0.01371390 0.1019851
#To print the p values
data.frame(sums$pzmat)
## Direct Indirect Total
## RD60 0.000000e+00 0.0001243143 0.000000e+00
## DV60 0.000000e+00 0.0001031767 0.000000e+00
## MA60 0.000000e+00 0.0002102746 0.000000e+00
## PS60 1.238746e-07 0.0015203493 1.308510e-07
## UE60 5.451867e-03 0.0326507844 5.973215e-03
We see that all the variables have signficant direct, indirect and total effects. You may want to have a look at how things differ when you just run a non spatial model.
fit_1_OLS <- lm(HR60 ~ RD60 + DV60 + MA60 + PS60 +UE60, data=ncovr_n_sf)
summary(fit_1_OLS)
##
## Call:
## lm(formula = HR60 ~ RD60 + DV60 + MA60 + PS60 + UE60, data = ncovr_n_sf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1581 -1.5986 -0.5770 0.7949 28.4121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.39963 0.66312 9.651 < 2e-16 ***
## RD60 1.85730 0.15692 11.836 < 2e-16 ***
## DV60 1.11883 0.07604 14.713 < 2e-16 ***
## MA60 -0.19537 0.01976 -9.889 < 2e-16 ***
## PS60 0.37748 0.07782 4.851 1.34e-06 ***
## UE60 0.09277 0.03021 3.071 0.00217 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.132 on 1667 degrees of freedom
## Multiple R-squared: 0.1833, Adjusted R-squared: 0.1809
## F-statistic: 74.83 on 5 and 1667 DF, p-value: < 2.2e-16
#Time matters
This is a fine chapter too if you struggle with the explanations in the required reading. Many universities, like the University of Manchester, have full access to Springer ebooks. You can also have a look at these notes.↩
This is a reasonable explanation of how to interpret R-Squared.↩
This blog post provides a nice animation of the confidence interval and hypothesis testing.↩
This is a nice illustration of the Simpon’s Paradox, a well known example of ommitted variable bias.↩
I recommend reading chapter 13 “Woes of regression coefficients” of an old book Mostseller and Tukey (1977) Data Analysis and Regression. Reading: Addison-Wesley Publishing.↩